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ABSTRACT 

We present a current catalog of 21 cm HI line sources extracted from the Arecibo Legacy 
Fast Arecibo L-band Feed Array (ALFALFA) survey over ~2800 deg 2 of sky: the a. 40 catalog. 
Covering 40% of the final survey area, the a. 40 catalog contains 15855 sources in the regions 
07^30™ < R.A. < 16 h 3Q m , +04° < Dec. < +16° and +24° < Dec. < +28° and 22' 1 < 
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R.A. < 03^, +14° < Dec. < +16° and +24° < Dec. < +32°. Of those, 15041 are certainly 
extragalactic, yielding a source density of 5.3 galaxies per deg 2 , a factor of 29 improvement over 
the catalog extracted from the HI Parkes All Sky Survey. In addition to the source centroid 
positions, HI line flux densities, recessional velocities and line widths, the catalog includes the 
coordinates of the most probable optical counterpart of each HI line detection, and a separate 
compilation provides a crossmatch to identifications given in the photometric and spectroscopic 
catalogs associated with the Sloan Digital Sky Survey Data Release 7. Fewer than 2% of the 
extragalactic HI line sources cannot be identified with a feasible optical counterpart; some of 
those may be rare OH megamasers at 0.16 < z <0.25. A detailed analysis is presented of 
the completeness, width dependent sensitivity function and bias inherent of the a. 40 catalog. 
The impact of survey selection, distance errors, current volume coverage and local large scale 
structure on the derivation of the HI mass function is assessed. While a. 40 does not yet provide 
a completely representative sampling of cosmological volume, derivations of the HI mass function 
using future data releases from ALFALFA will further improve both statistical and systematic 
uncertainties. 

Subject headings: galaxies: spiral; — galaxies: distances and redshifts — galaxies: luminosity 
function, mass function — radio lines: galaxies — catalogs — surveys 



Introduction 



The evolution of baryons within their dark matter halos and the morphologies of the resulting systems 
depend on the merger and accretion history of the parent halos. Major efforts of galaxy evolution studies 
today focus on how galaxies acquire the gas which fuels their star formation and what processes drive the 
distinctions between the red sequence and the blue cloud. Still, our view of the extragalactic universe is 
only as complete as our methods for cataloging the galaxies that populate it. While the public wide area 
optical/IR and associated spectroscopic surveys are good at detecting luminous ellipticals, bright spirals and 
bursting or active galaxies, they are substantially less complete in tracing the low surface brightness, dwarf 
and gas-rich galaxy populations that actually dominate the local population. Each catalog derived from 
an individual survey has its own built-in limitations and biases which affect our ability to construct a true 
census of the present day universe. 

Because of its relatively simple physics, the HI line provides a useful tracer of the cool gas mass and 
of the star formation potential in nearby galaxies and probes the very population of modest luminosity, gas 
rich objects which are often underrepresented in surveys selected by optical/IR properties. While it is clear 
that most stars form out of molecular rather than atomic hydrogen, the molecular clouds themselves develop 
through the collapse of overdensities in the more diffuse, neutral medium. Thus, while the connection of HI 
to star formation is on small scales indirect, the global HI content serves as a tracer of relative SF potential. 
However, at present, HI line measurements yield HI masses Mm for far fewer galaxies than those for which 
stellar masses M* are available from optical/IR wide area surveys. In fact, onl y now are HI surve ys adequate 
in terms of volume sensitivity to sample a cosmologically significant volume ( Martin et al. 2010f) . 



After the pion eering results delivered by small-scale surveys such as the Arecibo HI Strip S urvey (AHISS: 
Zwaan et al.lll997h and the Arecibo Dual Beam SurvevfADBS: iRosenberg fc Schneiderll2002l ) . the advent of 
multi-feed array receivers on large single dish telescopes made po s sible wide-area 21 c m HI line surveys, such 
as the HI Parkes All-Sky Survey, (HIPASS: iBarnes et allboOll iMever et all liool IWong et alJbood ) and 
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the companion HI Jodrell Bank All-Sky Survey fHIJASS: lLang et al.ll2003l) . While covering a large fraction 
of the sky, these surveys failed to sample a cosmologically fair volume because their mean depth was too 
shallow, typically < 40 Mpc, and they were limited in both angular and spectral resolution and in sensitivity. 
As a result, HIPASS sampled only sparsely both the most Hi-rich — but rare — objects and the lowest 
halo mass systems — detectable only if very nearby and with very narrow HI line widths — and, because of 
the large Parkes antenna beam (15.5'), suffered from confusion in the identification of optical counterparts 
(OCs). 

The advent of a similar seven feed array at Arecibo ("ALFA", the Arecibo L-band Feed Array) has 
enabled a seco nd-generation wide area e xtragalactic HI line survey, A LFALFA, the Arecibo Legacy Fast 
ALFA survey jGiovanelli et al.ll2005al |bl: lGiovaneilill2008t iHayneslboO^ l. Initiated in February 2005, survey 
observations are now more than 90% complete. In this paper, we present the catalog of HI detections 
covering about 40% of the planned survey sky area, referred to hereafter as the a. 40 catalog. Both by 
design and because of improvements made possible by the accumulation and analy sis of more survey data , 
the catalog pres e nted h ere both extends a n d supercedes e arlier o nes presented by iGioyanelli et al.l (|2007| ); 
Saintonge etail |2008l) ; iKent et al.l |2008lk IMartin et ail |2009h ; Istierwalt et~al1 |2009j). In addition, the 
ALFALFA data release presented here includes, where applicable, a cross reference to t he optical survey 



datas et corresponding to Data Release 7 (DR7) of the Sloan Digital Sky Survey (SDSS: lAbazaiian et al 
20091) . 



The availability now of a large body of ALFALFA data, constituting 40% of the e xpected final 



allo ws us to under t ake an examination of the characteristics of its catalog of HI sources. IMartin et al 



survey. 
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and iToribio et al.l (|2011al ) have presented earlier considerations of survey characteristics for subsets of the 
a. 40 catalog specifically in the context of using the ALFALFA survey to derive the HI mass function (HIMF) 
and to establish a standard of normal HI content for galaxies in low density environments, respectively. Here, 
we examine the full a. 40 catalog, discuss its identification of optical counterparts, and compare parameters 
derived from its mea surements with those a vailable in the previous compilation of targeted HI line obser- 
vations presented by ISpringob et al.l (|2005al ) . We also present a more detailed look at the completeness of 
a. 40 and how HI source catalog limitations in general can affect measurements of the HIMF. 

This paper is organized as follows: In <j2j we discuss the observational strategy, sky coverage, and data 
processing associated with the production of the ALFALFA dataset and its final data products. <J3] presents 
the a. 40 catalog of HI sources. The identification of the optical counterparts (OCs) of the HI sources is 
discussed in SQJ In that section, we present the crossmatch of the a. 40 catalog to the SDSS DR7 database 
and discuss those circumstances under which the ALFALFA detection is not associated with an OC. A 
comparison of the HI line parameter s derived from the ALF ALFA survey with those extracted from the 
large targeted HI dataset presented in ISpringob et alJ (|2005al ) is used in S}5] to validate the photometric and 
spectral calibration underlying the ALFALFA source parameters. An analysis of the survey completeness 
and reliability is presented in ^followed in Sj7]with a discussion of how the a. 40 survey characteristics impact 
its cosmological applications, in particular, the derivation of the HIMF. A brief summary of the main points 
of this paper is given in $8] 



2. Data 



The ALFALFA observing strategy has been discussed in detail in lGiovanelli et al.l (|2005al ) and lKent &: Giovanelli 

(|201ll ). Of particular note to this data release, observations during a given observing session use the ALFA 
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seven-beam receiver parked on the meridian with data acquired in "almost fixed" drift-scan mode; minor 
motion of the telescope is permitted so that the position of the central beam tracks in constant J2000 Dec- 
lination. With the feed arm positioned along the meridian at azimuths near 180° (for declinations north of 
the Arecibo zenith at Dec. = 18°21') or 360° (for declinations south of zenith), the feed array is rotated 
by 19° so that the seven beams sweep out tracks equally spaced in declination by about 2.1'. In nearly 
all circumstances, a given observing run is dedicated to a single declination track. The 2-D (time versus 
frequency) drift scan datasets are converted from FITS to IDL format and run through an initial bandpass 
calibration and subtraction, normally within 24 hours of acquisition. 

In contrast to traditional total power, position switched pointed observations, a drift-scan survey (of 
which ALFALFA is certainly not the first example) collects spectra continuously (almost) without moving 
the telescope. In the case of the ALFALFA survey, the sampling rate is 1 Hz, i.e. a spectrum of 4096 
spectral channels (a "record") is recorded every second for each polarization of every beam of the feed 
array. The slowly- changing characteristics of the bandpass with time can thus be monitored effectively. 
The ALFALFA pipeline docs so by separately monitoring the behavior of each spectral channel across the 
time domain, through a robust, low-order polynomial fit (which skips over sources), outside of the spectral 
region dominated by Galactic emission. For each 600 record unit (a 10 minute drift "scan"), we thus obtain 
a two-dimensional map of the bandpass which can be "subtracted" from each spectral record. Such "sky 
subtraction" is thus conceptually similar to that of the traditional position-switching mode, although the 
duration of the "off" is much larger than that "on" source, gaining y2 in sensitivity with respect to standard 
position-switching observations. During the same processing step, continuum subtraction is also performed, 
and a separate continuum map is recorded. 

For spectral channels affected by Galactic HI emission, such "sky subtraction" is not an option, and the 
bandpass subtraction cannot be applied in the same manner as for spectral channels away from the Galactic 
signal. In this case, the spectral shape of the bandpass across the Galactic emission region is adopted as a 
linear interpolation between the two Galactic emission-free sides of the spectrum. Thus, the flux calibration 
of Galactic features processed by the standard ALFALFA pipeline is not accurate. 

Each 2-D bandpass-subtracted dataset for each beam and each polarization is examined interactively 
and flagged for radio frequency interference (RFI); regions characterized by lowered quality (due to standing 
waves, gain instabilities etc) are assigned a lower weight. While this step (known as "flagbb") is laborious, 
the facts that the continuum information is retained and the RFI is not median filtered away enables the 
further use of the dataset to look for HI absorption, fo r the derivation of upper limits at arbitrary positions 



in 3-D, and for stacking analysis ([Fabello et al.ll201ll ). The flattened and flagged 2-D line and continuum 
maps are archived as Level I datasets. 

Once the set of drift scans providing full coverage for a complete strip in declination is flagged in this 
manner, the set o f even ly gridded data cubes is generated. Details of the gridding process are given in 



Kent fc Giovanellil (|2011f) and summarized here. The grids are square in the angular dimension, 2.4° on a 
side, evenly sampled at 1' spacing. Their center positions on the sky are spaced 8 mm apart in R.A. and 
centered on odd integer declinations; the spatial dimensions of a grid are 144 by 144 pixels. For convenient 
access using modest data processors, each spatial grid is split into four, partially overlapping subgrids, each 
covering 1024 frequency channels. The grid generation algorithm also converts the spectral intensities from 
units of antenna temperature to mJy/beam in flux density, correcting for zenith angle variations in the gain of 
the telescope. A first step in the examination of the grids performs an astrometric fit to the continuum sources 



within them; this fit is then used to subtra ct off the residual telescope pointing errors (jGiovanelli et al 



2007 



Kent et al. 20081 Kent fe Giovanelli 2011 ). Grids are then flatfielded and rebaselined in both the angular 
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and spectral dimensions to improve their quality by accounting for variations in gain, calibration and other 
systematic blemishes. "Flathelding" here corresponds to the process by which pixel-to-pixel variations within 
each channel map, caused mainly by continuum fluctuations, are accounted for. For spectral channels away 
from Galactic emission, extragalactic HI sources are typically small in comparison with the angular size of 
ALFALFA data cubes ("grids" of 2.4° x 2.4°). Large-scale variations in the continuum level which may 
not have been effectively removed by the bandpass subtraction procedure can be identified by robust-fitting 
a two-dimensional surface (in the angular domain) from the channel map. In the absence of very strong 
continuum sources, this correction is generally small and it does not affect noise statistics in any significant 
way. 

After the angular flat fielding is performed, residual, localized spectral baseline features are also removed 
by subtracting low order polynomial fits to the signal free portions of the spectral domain around emission 
features. These arise, for example, from standing waves produced by multiple reflections of continuum source 
emission within the optical path. 

Signal extraction is applied following Saintonge] ( 2007a ). and once a catalog of candidate detections 



has been obtained, the grid is interactively examined, the global profiles are extracted, fluxes are measured, 
OCs are identified and remarks are recorded. It should be noted that this interactive process improves the 
definition of source parameters beyond the model fitting used by the automatic signal extractor; this point, 
and the resultant reliability and completeness of the catalog, is discussed more fully in Sj7l The final catalog 
of sources is constructed following a process of culling poorer quality detections where a source is contained 
in adjacent overlapping grids and running a series of data quality checks. 

The catalog presented here supercedes previous ALFALFA data releases for several reasons mainly 
having to do with (1) the increased size of the available dataset which yields better understanding of pointing 
errors, gain variations and other instrumental artifacts, (2) improved SDSS coverage since the first catalogs 
were produced, (3) improvements in the algorithm used to make global profile measurements and (4) increased 
contiguous coverage. Some earlier measurements tended to underestimate fluxes for the brightest and more 
extended sources, a systematic effect for which a correction is now applied (see for the comparison of 
flux density measurements with published values). In most cases, changes to the flux density measurements 
included in earlier data releases are minor, but the current catalog is intended to replace the earlier ones 
entirely. It should be noted that further revisions of parameters for sources located near edges of the current 
grid coverage will come in the future in those cases when a newer grid in an adjacent strip better encompasses 
the source or contributes a higher quality dataset. By its nature as a cumulative drift scan survey, the harvest 
of ALFALFA will both grow and improve over time. 

The full ALFALFA survey is intended to cover 7000 deg 2 of sky in two regions of high Galactic 
latitude within 18° of the Arecibo zenith. All declinations will be covered 0° < Dec. < +36°. Since all 
observations are conducted during nighttime hours, the two regions are referred to as "spring" and "fall'. 
The "spring" region extends from 07 h 30 m < RA. < I6 h 30 m while the "fall" ALFALFA region encompasses 
from 22 h < RA. < 03^. Some sources are found outside the stated R.A. boundaries where the actual drift 
scan observations extended beyond the nominal map area. Some priority has been given to completing 
areas within the SDSS spectroscopic survey footprint, and the pace of observing has been dictated by the 
availability of telescope time. Figure [T] illustrates the area of the sky contained in the a. 40 catalog presented 
here: regions 07 h 30 m < R.A. < W^G 171 , +04° < Dec. < +16° and +24° < Dec. < +28° (the "spring" 
region) and 22 h < R.A. < 03 h , +14° < Dec. < +16° and +24° < Dec. < +32° (the "fall" region). 
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3. Catalog Presentation 

We present in Table [T] the measured parameters for 15855 detections, 15041 of which are certainly 
associated with extragalactic objects. An additional 814 are detected at velocities which suggest they may 
not be extragalactic but are more likely to be Galactic high velocity cloud (HVC) features. The contents of 
Table [T]are as follows: 



• Col. 1: Entry number in the Arecibo General Catalog (AGC), a private database of extragalactic 
objects maintained by M.P.H. and R.G. The AGC entry normally corresponds both to the OC and 
the HI line source except in the cases of HVCs and other HI sources which cannot be associated with 
an optical object with any high degree of probability. In those cases, the AGC number corresponds 
only to the HI detection. An AGC number is assigned to all ALFALFA sources; it is intended to be 
used as the basic cross reference for identifying and tracking ALFALFA sources as new data acquired 
in overlapping regions supercedes older results. Note that in previous ALFALFA catalogs, an index 
number was used, a practice no longer employed; a cross-reference to these older identifications is 
provided in Tabled The designation of an ALFALFA source referring only to its HI emission (without 
regard to its OC) should be given using the prefix "HI" followed by the position of the HI centroid as 
given in Col. 3 of Tabled] 

• Col. 2: Common name of the associated OC, where applicable. Further discussion of the process of 
assigning optical counterparts is presented in £14.11 

• Col. 3: Centroid (J2000) of the HI line source, in hhmmss.sSddmmss, after correction for systematic 
telescope pointing errors, which are on the order of 20" and depend on declination. The systematic 
pointing corrections are derived from an astrometric solution for the NRAO Very Large Array Sky 



Survey (NVSS) radio conti nuum sourc es ( Condon et al.lll998l ) found in the grids. As discussed in 



Giovanelli et al.l (|2007l) and iKent et al.l (|2008f ) , the assessment of centroiding errors is complicated by 
the nature of 3-D grid construction from the 2-D drift scans, those often acquired in widely separated 
observing runs, and, for resolved/confused sources, unknown source structure. As those authors suggest 
the best assessment of HI centroid error is accomplished by comparison of the HI centroids with the 
positions of the adopted OCs. An analysis of the positional offsets of the HI centroids from the 
positions of the OCs yields a relation for the median error in the HI position evi: m ed,Hi as a function 
of the signal-to-noise ratio, S/N (see Col. 7), for the a. 40 sample: 

err „ „r (arcsec) - ( ?L " 79< l ° gS,N + 26 " lo ^ S ' N ? lo 9 S / N < L6 m 
err med ,Hi{arcsec) - | n ^ 1Q W 

On average, the positional offset is about 18", but it can, in rare instances exceed 1'; those cases are 
noted in the comments included in Table 

Col. 4: Centroid (J2000) of the most probable OC, in hhmmss.sSddmmss, associated with the HI line 
source, where applicable. The OC has been identified and its likelihood has been assessed interactively 
using tools provided through the Sky View website or the SDSS Explore Tool, in addition to to the NASA 
Extragalactic Database (NED) and the AGC and make use of judgmental criteria including redshift 
(when known), size, morphology and optical color. The optical positions are normally estimated to be 
3" or better but may be larger in exceptional cases (very low surface brightness or peculiar, disturbed 
objects). The process of assignment of the most probable OC is discussed in §4.11 It should be noted 
that only one OC is assigned per HI source although in reality confusion within the telescope beam is a 
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possibility. Suspected cases of confusion or ambigous assignment of the OC are noted in the comments 
included in Table HJ 

Col. 5: Heliocentric velocity of the HI source, cz Q in km s _1 , measured as the midpoint between the 
channels at which the flux density drops to 50% of each of the two peaks (or of one, if only one is 



present) at each side of the spectral feature; see also ISpringob et al.l (|2005al) . The error on cz G to be 



adopted is half the error on the width, tabulated in Col. 6. 

Col. 6: Velocity width of the HI line profile, W50 in km s" 1 , measured at the 50% level of each of 
the two peaks, as described in Col. 5 and corrected for instrumental broadening. No corrections due 
to turbulent motions, disk inclination or cosmological effects are applied. The estimated error on the 
velocity width, e w , in km s _1 , follows, in parentheses. This error is the sum in quadrature of two 
components: a statistical error and a systematic error associated with the subjective guess with which 
the person performing parameter extraction estimates the spectral boundaries of the feature, flagged 
during the interactive assessment of candidate detections. In the majority of cases, the systematic 
error is significantly smaller than the statistical error; thus the former is ignored. 

Col. 7: Integrated HI line flux density of the source, £21, in Jy km s _1 . This value corresponds to the 
total HI line flux measured on the integrated spectrum obtained by spatially integrating the source 
image over a solid angle of a t least T x T and dividing by the sum of the sur vey beam values over the 
same set of image pixels (see Shostak fc Allen 1980t Kent fc Giovanelli 2011 ). Estimates of integrated 



flux densities for very extended sources with significant angular asymmetries can be misestimated by 
our algorithm, which is optimized for measuring sources comparable with or smaller than the survey 
beam. A special catalog with parameters of extended sources will be produced after completion of the 
survey. The issue is especially severe for extended HVCs that exceed in size that of the ALFALFA data 
cubes. In these specific cases, only the flux in the knots of emission is measured. In general, the HVCs 
have been catalogued here applying the same kind of S /N selection threshold as for the extragalactic 
signals, with the exception of the southern extension of Wright's cloud, where, in addition to a bulk 
measurement of the portion of the cloud lying within this region, a selection of the brightest knots 
was measured to trace the structure. See Column 12 and the corresponding comments for individual 
objects. The estimated uncertainty of the integrated flux density, in Jy km s , is given in parentheses. 

Col. 8: Signal-to-noise ratio S/N of the detection, estimated as 

S/N = ( ™>*L) ^ (2) 



W50 

where S21 is the integrated flux density in Jy km s" 1 , as listed in Col. 7; the ratio IOOOS21/IV5O is 
the mean flux density across the feature in mJy; w sm o is either WnoJ{2 x 10) for W50 < 400 km s _1 
or 400/(2 x 10) = 20 for IV50 > 400 km s~ 1 (w sm o is a smoothing width expressed as the number of 
spectral resolution bins of 10 km s _1 bridging half of the signal width; the raw spectra are sampled at 
24.4 kHz ~ 5.5 km s _1 at z ~ 0); and a rms is the r.m.s noise figure across the spectrum measured in 
mJy at 10 km s _1 resolution, as tabulated in Col. 9. 

• Col. 9: Noise figure of the spatially integrated spectral profile, a rms , in mJy. The noise figure as 
tabulated is the r.m.s. as measured over the signal- and RFI-free portions of the spectrum, after 
Hanning smoothing to a spectral resolution of 10 km s _1 . 



Col. 10: Adopted distance in Mpc, Dm pc - For objects with cz Q > 6000 km s 1 , the distance is simply 
estimated as cz cm b/H where cz cm {, is the recessional velocity measured in the Cosmic Microwave 
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Background reference frame (jLineweaverl 1 1 9961 ) and H is the Hubble constant, adopted to be 70 km 



s 1 Mpc 1 . For o bjects with cz cm b < 6000 km s 1 , we use the local universe p eculiar velocity mode l 



of iMastersI (|2005l ). which is based on data from the SFI++ catalog of galaxies (ppringob et al.l 12007) 
and results from analysis of the peculiar motions of galaxies, groups, and clusters, using a combination 
of primary distances from the literature and secondary distances from the Tully-Fisher relation. The 
resulting model includes two attractors, with infall onto the Virgo Cluster and the Hydra- Centaurus 
Supercluster, as well as a quadrupole and a dipole component. The transition from one distance 
estimation method to the other is selected to be at czq = 6000 km s -1 because the uncertainties in 
each method become comparable at that distance. Where available, primary distance s as available in 



the p ublished literature are adopted. When the galaxy is a known member of a group (ppringob et al 



2007), the group systemic recessional velocity cz cm b is used to determine the distance estimate according 



to the general prescription just described. 

Col. 11: Logarithm of the HI mass Mjjj, in solar units, computed via the standard formula Mjji = 
2.356 x 10 5 Df Ipc S2i and assuming the distance given in Col. 10. No correction for HI self-absorption 
has been applied. 

Col. 12: This column contains three relevant coded flags: 

The first code, assigned as an integer value of 1, 2 or 9, refers to the category of the HI detection 
defined as follows: 

Code 1 refers to sources of S/N and general qualities that make it a reliable detection. These 
signals exhibit a good match between the two independent polarizations observed by ALFALFA, a 
spatial extent consistent with the telescope beam (or larger ) , an RFI-free spectral profile, and an 



approximate minimum S/N threshold of 6.5 (|Saintongdl2007al) . These criteria lead to the exclusion of 



some candidate detections with S/N > 6.5; likewise, some features with S/N slightly below this soft 
threshold are included, due to optimal overall characteristics of the feature, such as well-defined spatial 
extent, broad velocity width, and obvious association with an OC. We estimate that the detections 
with code 1 in Tabled] are nearly 100% reliable; the completeness and reliability of the a. 40 catalog 
are discussed in SJ7] 

Code 2 refers to sources categorized as "priors". They are sources of low S/N (< 6.5), which would 
ordinarily not be considered reliable detections by the criteria set for code 1, but which have been 
matched with OCs with known optical redshifts coincident (to within their errors) with those measured 
in the HI line. We include them in our catalog because they are very likely to be real. In general, 
however, they should not be used in statistical studies which require well-defined completeness limits; 
this point is further discussed in $71 

Code 9 refers to objects assumed to be HVCs; no estimate of their distances is made. 

Of the 15855 sources included in this data release, 11941 are classified as source code 1, 3100 are code 
2, and 814 are code 9. 

The second code, assigned as an alphabetic character, refers to a category reflecting the status of the 
cross identification of the ALFALFA detection with an entry in the SDSS DR7 database, as judged 
by the ALFALFA team. This code is used to identify galaxies which lie outside the SDSS DR7 sky 
footprint or for which there are clearly issues with the identification. It should be noted that this code 
refers only to the cross match with SDSS DR7. The cross-reference and basic parameters of the OCs 
is given in Tabled This code and its interpretation are as follows: 
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I: "identified": The PhotoObjID is set but no other indicative flags have been applied; this code 
applies whether or not there is a SDSS spectroscopic counterpart. 

O: "outside DR7": The SDSS OC lies outside of the SDSS DR7 footprint and thus no DR7 cross- 
match can be performed. 

U: "unidentified": No SDSS OC has been identified, but the object lies within the SDSS DR7 
footprint. 

N: "no DR7 photometric ID": No SDSS DR7 photometric source has been identified; assignment 
of this code can result from proximity to bright star, satellite trails, incomplete coverage or for other 
reasons. 

M: "missing" : The OC is in the SDSS DR7 footprint region but neither a PhotoObjID or a SpecOb- 
jlD are returned to queries of the SDSS DR7 database. 

P: "photometry suspect": The SDSS DR7 photometry for the associated PhotoObjID are suspect 
for some reason as judged by the ALFALFA team. Assignment of this code often is associated with the 
identification of multiple near-equal-flux photometric objects within an obviously single OC. Such cases 
apply often to very large optical objects or to faint, low surface brightness and/or patchy systems. The 
optical photometry associated with the SDSS "parent" object may be adequate but caution should be 
exercised. 

D: "displaced SDSS object" : The SDSS Photo/SpcctID is displaced from the optical galaxy center, as 
identified by ALFALFA team. The PhotoObjID may be legitimate; often this is brightest photometric 
"child". Because of the displacement, the SDSS redshift may not reflect the systemic recessional 
velocity of the galaxy. 

T: "two SDSS objects": The SDSS PhotoObjID associated with the galaxy center is displaced 
from the target associated with the SDSS SpectObjID, as judged by the ALFALFA team, i.e., the 
best PhotoObjID does not coincide with the SpectObjID. Usually, the SpectObjID is an offcenter HII 
region or other bright knot within the target galaxy. 

S: "superposed SDSS object": The SDSS redshift corresponds to a superposed foreground star or 
background QSO. 

B: "bad SDSS solution" The SDSS redshift is unreliable or rejected for some unspecified reason. 

The third code, given as an asterisk where applicable, indicates that a comment regarding the HI 
detection and/or the assignment of the OC is included for this source in Tabled 



Only the first few entries of Tabled] are listed in the printed version of this paper. The full content of 
Table Q] is accessible through the electronic version of the paper and will be made available also through our 
public digital archive sitcu and the ALFALFA project data sit<4U- 

In addition to the HI emission sources presented i n Table [H it is expec ted that the ALFALFA spectral 
cubes will also contain evidence for HI in absorption. iDarling et al.l (|201ll) discuss a pilot program which 
uses an adaptation of the ALFALFA pipeline to search for HI absorption along the line of sight to NVSS 
sources in a small number of the ALFALFA cubes. The known HI absorber in the interacting system UGC 
6081 was recovered. Because the standard ALFALFA reduction is not designed to look for such phenomena, 



1 http: / /arecibo. tc.cornell.edu/hiarchive/alfalfa/ 
2 http: / / egg.astro.cornell.edu/alfalfa/data 
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the HI absorption detection is not included in Table [T] and the reader is referred to iDarling et al.l (|201ll ) for 
its parameters. 

Tabled] contains comments about entries in Table Q] which have been recorded in the course of extracting 
source parameters and identifying the OCs. The second column contains a cross-reference to the catalog 
identification used in earlier papers, which is no longer used. We repeat in Column 3 the HI detection 
code assigned for each source (the first code in Column 12 of Table Q] described above). It should be noted 
that angular separations given in these comments are referenced to the centroid of the HI source, not the 
position of the OC. These notes are somewhat heterogeneous in nature, having been incorporated during the 
process of data reduction by the individual responsible for source extraction. Since the extraction has been 
performed over a period of several years, the databases available to the person making the comments have 
evolved; thus the mention of nearby neighbors is not intended to be complete and should not be used in any 
derivation of local density. In some cases these notes identify issues with data quality, certainty of the OC 
or parameter extraction. The presence of a note does not mean necessarily that parameters are less certain 
than their errors indicate, as we have a tendency to err on the conservative side of casting doubt. They are 
included here because they provide an additional contribution to the legacy value of the dataset. 



Subsets of the a. 40 catalog ha ve been included in the derivation of the HI mass function (jMartin et al 



2010 ) and the HI width function (jPapastergis et al.l 120111 ); both papers include discussion of the sample 



characteristics, limitations and biases. Similar to figures shown in those papers, Figure [2] illustrates the 
distributions of (top to bottom) redshift cz, W 50 , log 521, log S/N and log M HI for the full ALFALFA a. 40 
sample presented in Table [TJ while Figure [3] shows the corresponding Spaenhauer plot. Further discussion of 
the impact of survey characteristics on cosmological issues and specifically on the derivation of the HI mass 
function is given in <J7] 



4. Optical Counterparts of ALFALFA Sources 

The principal aim of ALFALFA is to catalog all gas-bearing extragalactic objects in the local universe. 
An integral part of understanding this HI census is similarly identifying the stellar counterpart associated with 
each HI source, or even more importantly, rejecting that such a counterpart exists. During the ALFALFA 
data reduction process, optical images from the Palomar Digital Sky Survey (DSS2) and, where available, the 
SDSS are interactively examined alongside the ALFALFA HI dataset and the most probable OC is identified 
and recorded. While this assignment may not be correct in individual cases, it provides a first approach 
to understanding the relationship between the HI source and its stellar counterpart. The notes included in 
Table [5] record comments on this process made by the ALFALFA team member performing this interactive 
stage of the data analysis. In this section, we describe the process by which OCs are identified and discuss 
unresolved issues, provide a cross reference of sources to the SDSS DR7 database and summarize general 
results on the evidence for "optically-dark" galaxies. 



4.1. Identifying Optical Counterparts 

We make use of Vir tual Observatory tools embedded in the IDL-based ALFALFA reduction package 
|Kent fc Giovanellill201ll ) to access several public imaging and catalog databases at several stages in the data 
reduction process. During the process of HI parameter measurement (the routine called "galflux"), both 
DSS2(B) and SDSS images are examined to identify interactively the most probable OC of each ALFALFA 



- 11 - 



source. Because of their generally superior quality and ancillary information, preference is given to the SDSS 
images where they are available. Entries in our internal AGC database as well as those listed in the NASA 
Extragalactic Database (NED) can be retrieved and examined. The ALFALFA team member processing 
each source uses the available public information as to color, morphology, redshift, separation from the HI 
centroid in combination with his/her scientific judgement in assigning an optical counterpart. It should be 
noted however that because the catalogued data presented here were reduced over a three year time period, 
not all current information/data were available at the time this assignment was made. Consistency checks 
are made later to look for redshift discrepancies or cases of large positional offset. 

With that caveat in mind, Figure [4] shows several examples which illustrate the process of identification 
of OCs and the uncertainties inherent in it. Each panel shows a 3' by 3' SDSS g-band image centered on the 
HI centroid. The superposed circle marks the OC identified in Table [TJ the size of each circle is arbitrarily 
chosen for best illustration of the target. The panels are intended to illustrate some of the challenges of 
assigning the OC by highlighting four specific cases as follows: 

• The upper left image is centered on the best-fit position of the HI source detected at HI095452. 2, +142907, 
a weak source of S/N = 7.3. The corresponding OC AGC 193821 is identified as the small galaxy SDSS 
J095453. 79+142910.0 22" from the HI centroid and partly contaminated by the diffraction spike of the 
bright foreground star; the galaxy is more evident in the DSS2(B) image. There is no further optical 
information. 

• The upper right image is centered on the position of HI123120. 9+050402, a marginal ALFALFA de- 
tection with a S/N = 4.9. The OC AGC 220720 is identified as VCC 1347 = CGCG 042-143 = 
J123117. 00+050429. 3, a small spiral galaxy offset from the HI centroid by about 64"; the large offset is 
not surprising given the low S/N of the HI detection. The SDSS optical redshift is 9830 ± 30 km s -1 , 
slightly off the HI cz of 9873 ± 4 km s _1 . Because of the low S/N of the HI emission profile but the 
coincidence with an optical galaxy with an adequately close redshift match, the optical identification 
is made and the source is designated as a "prior" and assigned an ALFALFA detection category code 
of 2. 

• The lower left image is centered on the position of HI 152240.3+055017, a very narrow (Wsq = 24 km 
s _1 ) feature at cz = 1796 km s _1 . The OC is identified as a dwarf galaxy AGC 258471 better evident 
in the DSS2(B) image at J152238. 7+054945; the SDSS pipeline identifies at least five photometric 
objects within the LSB emission associated with the dwarf so that its magnitude is poorly measured. 
The offset of the HI centroid from the optical object is about 38". 

• The lower right image is centered on the position of HI 160743.9+272201, a source of S/N = 10.9. 
As evident in the SDSS g-band images, there are several objects in the field, including a close pair 
associated with SDSS spectroscopic target J160744.75+272140.2 = KUG 1605+275 NED01 with a 
redshift from the SDSS of 23676 ± 31 km s _1 . The redshift is too high to be associated with the 
ALFALFA HI source; several other galaxies in the vicinity of this system have similar redshifts. Careful 
examination of the SDSS image shows a second object, which is not identified in the SDSS photometric 
database and which appears to be partly overlapping with J160744. 75+272140. 2 but in its foreground 
at J160743. 9+272201. We identify the HI source with this foreground blue galaxy which becomes AGC 
749361. 

We emphasize again that because of the ALFALFA centroid position uncertainty and its relatively large 
beam size, assignment of the most probable OC is a reasonable but not a perfect process. Furthermore, it 
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will continue to be a dynamic one, striving for improvement when new data provide improved detail. For 
example, the current dataset does not include yet a systematic incorporation of data from the SDSS III 
survey or its DR8. 



4.2. Cross Reference with the SDSS 

Although not available at the time of earlier ALFALFA data releases, the completion of the SDSS legacy 
survey has afforded the opportunity to cross reference the ALFALFA and SDSS datasets where the two share 
footprints. As a new feature of this and future ALFALFA catalog releases, here we provide, in Table [31 the 
cross identifications of ALFALFA sources with the phot ometric and spectrosco pic catalogs associated with 
the SDSS, in this instance, with the data release DR7 ( Abazaiian et al. 20091) . Entries in Table [3J are as 
follows: 

• Col. 1: the source AGC number, identical to Col. 1 of Table Q] 

• Col. 2: the HI detection category code, identical to the 1st (integer) code in Col. 12 of Tabled] 

• Col. 3: The SDSS cross reference catagory, identical to the 2nd code in Col. 12 of Table [1] 

• Col. 4: The SDSS DR7 photometric catalog object identification number (PhotoObjID), where appli- 
cable. 

• Col. 5: The SDSS DR7 spectroscopic catalog object identification number (SpecObjID), where appli- 
cable. 

• Col. 6: The r-band model magnitude corresponding to the photometric object or its SDSS parent. 

• Col. 7: The (u-r) color associated with the OC from the SDSS as reporte d in the DR7. This value is 



used in Figure [71 and in order to allow direct comparison with Figure 9 of iBaldrv et al.l (|2004l) . it has 
not been corrected for extinction or redshift. 

• Col. 8: The redshift corresponding to the SDSS DR7 spectroscopic catalog object, extracted from the 
SDSS DR7 database, where applicable. 

• Col. 9: The error on the redshift given in Col. 8, extracted from the SDSS DR7 database, where 
applicable. 

It is important that po tential users understan d the limitations associated with this ALFALFA-SDSS 



cross-reference. As noted bv lGiovanelli et al.l (|2007h and discussed in the ALFALFA HI centroid accuracy 
is of order 20", but increases as the S/N decreases, as given in Equation [T] Furthermore, as is well known, 
the standard SDSS image reduction pipeline suffers from source blending, and more importantly, shredding, 
particularly in the sources whose light distributions are patchy or of low surface brightness. The current 
ALFALFA reduction process includes an interactive step of direct examination of the SDSS imagery and 
issues associated with blending/shredding are noted immediately. However, earlier ALFALFA datasets which 
predated release of DR7 were not subject to such individual cross examination. While attempts have been 
made to flag and check suspicious cases, it is likely that some misidentifications remain. 

The DR7 photometric catalog object identification number given in Column (4) is the PhotoObjID whose 
magnitude and position given in SDSS DR7 corresponds most likely to the OC; the actual "best magnitude" 
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may be associated with the SDSS pipeline "parent" . Users are cautioned to understand fully issues associated 
with blending, shredding and poor sky subtraction and to make use of warning flags and other quality 
indicators when using the photometry associated with the photo metric object given here. Particularly 



releva nt discussions of background subtraction issues are given in IWest et al.l (|2010h and iBlanton et al 



(|2011[ ). Similarly, the spectroscopic identification refers to the most probable and most closely related SDSS 
spectroscopic target. This cross match likewise can suffer from issues of positional offset, signal-to-noise etc. 
and should be treated with similar caution. The SDSS DR7 cross reference category given in Col. 3. of Table 
[3] (and also as one of the two codes given in Col. 12 of Table [lj provides further comment on quality issues as 
identified by members of the ALFALFA team. However, because some of the processing of ALFALFA data 
predates the release of SDSS DR7, this code assignment should not be considered complete: many but not 
all sources have been revisited after the release of DR7. The intent of providing the cross reference is to make 
statistical studies more convenient and potentially homogeneous. But again, we emphasize the importance of 
visual examination of individual cases where such attention is critical to the drawing of scientific conclusions. 

Of the 15041 extragalactic (i.e., non-HVC) objects listed in Table [TJ 2312 lie outside the SDSS DR7 
footprint and 199 are classified as "dark" (see H4.3I) . Of the ones with identified OCs and included in DR7, 
11740 are assigned SDSS code "I" (meaning the SDSS photometric identification is acceptable and there 
is no issue with the spectroscopic identification where such exists) while the others are given a code in 
Table |3] indicating a recognized issue with either the SDSS photometry or spectroscopy. The ALFALFA 
fall portion of the sky contains some regions for which only photometry is available; in the spring region, 
the photometric and spectroscopic footprints overlap more completely. Of the 11240 ALFALFA spring sky 
galaxies with a corresponding SDSS photometric ID (of any code), 9377 (83%) have an associated entry in 
the SDSS spectroscopic catalog, and 1863 (17%) do not. 

Figures [5] and [6] provide graphical illustrations of the relative strengths and weakness of the ALFALFA 
and the SDSS surveys as tracers of the large scale structure in the local universe. Figure [5] shows a cone 
diagram of a four degree wide slice of the ALFALFA spring sky centered on Dec. = +26° and including 
the full ALFALFA bandpass redshift range cz < 18000 km s _1 . The upper cone extends over the full cz 
range covered by ALFALFA and the lower one, only the inner cz < 9000 km s _1 . Blue open circles mark 
the locations of galaxies detected by ALFALFA, while red filled ones denote objects with redshifts from the 
SDSS DR7. The fall-off in the density of blue points follows the distribution seen in Figure |3] The "finger 
of God" radial line-up of optical-cz (red) points so prominent in the lower di agram is the Coma cluster 
Abell 1656. Gala xies in that cluster are well known to be strongly HI deficient ([Giovanelli k, Havne i fl985t 



Magri et al.Ml988f) so that ALFALFA detects very few of them. As indicated by the numbers superposed on 
the diagram, the number of SDSS spectroscopic targets in the full ALFALFA volume is about three times 
the number of ALFALFA HI sources; in the inner volume illustrated in the bottom diagram, that ratio drops 
to two and to ~ 1 for field galaxies at cz < 5000 km s _1 . While strong bias against finding HI sources in 
the regions of rich clusters is clearly evident, the Hi-bearing galaxies trace well the large-scale supercluster 
structures and include some of the most isolated objects found in this nearby volume. 

For comparison, Figure [6] shows a similar cone plot covering a four degree wide slice of the ALFALFA 
fall sky centered on Dec. = +26°. The SDSS spectroscopic survey did not cover this region; the red filled 
circles mark objects with optical redshifts available from the literature. In this part of the sky, ALFALFA 
sources contribute the majority of redshifts even at its outer boundary. It should be noted that the slice of 
the sky sampled in Figure [5] covers a strongly overdense region of the local universe, the Coma- Abell 1367 
supercluster, whereas the fall region lies to the south of the main filament of the Pisces-Perseus supercluster 
and includes a portion of the void in front of it. As in all studies of the local universe, the actual large 
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scale structure contained in the survey volume can leave a strong imprint on the observed distribution of 
galaxies and their properties in limited samples. Further discussion of the impact of large scale structure on 
cosmological inference is included in $7] 

Making use of the SDSS cross reference tabulation, Figure [7] presents a color magnitude diagram (CMD) 
for the a.40-SDSS overlap sample for comparison with similar diagra ms extracted from the SDSS photomet- 
ric survey alone. A similar CMD was independently constructed by iToribio et al.l (|2011af) for a sample of 
ALFALFA galaxies found in low density environments. In Figure grayscale and contours sh ow the distri 



bution of the Hl-selected sample and the axes correspond to the range illust rated in Figure 9 of Baldrv et al 
(|2004l ). The superposed dashed line shows the optimum divider used by I Baldrv et al.l (|2004l ) to separate 



galaxies on the red sequence (above the curve) from those in the blue cloud (below it) and given by their 
equation 11. For the purpose of comparison with their Figure 9, no corrections for redshift or internal extinc- 
tion have b een applied to the m agnitudes used to construct Figure [Jj Figure [Jj can also be compared with 
Figure 4 of iTempel et al.l (| 2 1 lh who used a large sample of galaxies from SDSS DR7 and did apply a K- 
correction; in their figure, those authors also categorize separately elliptical and spiral galaxies according to 
the SDSS catalog parameter fg e v, the fraction of the galaxy's luminosity contributed by the deVaucouleurs 
profile. Clearly, the a. 40 c atalog is dominated b y blue spiral galaxies and is strongly biased against the red 
sequence. As discussed bv lTempel et al] (|201lf) . some of the luminous, red objects are truly red, luminous 
and gas-bearing objects; other luminous objects appear red because they are edge-o n disks for wh i ch the 
internal extinction correction is significant. Still, Figure [Jj confirms the conclusion of Masters et al. ( 2010l ) 
that the most luminous gas-rich population includes a significant fraction of red galaxies. Further discussion 
of the stellar and star forming properties as derived from SED fitting the photometry provided by the SDSS 
in the optical and the FUV /NUV by the Galax y Ev olution Explo r er (GA LEX) satellite for the ALFALFA 
sample will be presented in iHuang et al.l (|2011al) and iHuang et al.l (|2011bf ) . 



4.3. ALFALFA Detections without Optical Counterparts 



One of the scientific drivers behind blind HI surveys is the possibility of co ntributing gas rich but 



optically "dark" galaxies to the extragalactic census. Previous analyses by e.g., iBriggs I (J1990J) , of the 
statistics of targeted HI line surveys have shown that such objects must be rare; otherwise there would 
have been more sources detected serendipitously in the random off-source positions observed by the total 
power position-switching observing mode used for most of those earlier surveys . Indeed, perhaps the best 
example of an optically dark galaxy is the southwest component of HI 1225+01 (jGiovanelli fc Hayne sl ll989l: 
Chengalur. Giovanelli fc Havnes 1l995h . but it is not a purely isolated object, being located on the outskirts of 
the Virgo cluster and part of a binary system with its dwarf galaxy companion to the northeast. Of the 4315 



HI so urces reported in the HIPASS catalog, 84% were identified with one or more possible OCs (jDovle et al 



20051 ). Most of the remainder are located at l ow galactic latitud e where Galactic extinction strongly inhibits 
the hunt for the stellar counterpart. In fact. iDovle et al. I |2005h investigated through followup observations 
the 13 HIPASS without OCs and with Ay < 1 mag and concluded that not a single one could be claimed as 
an isolated dark galaxy. Some might be intergalactic in the sense of being associated with tidal debris fields 
or fragments of very extended HI disks, but always there were nearby, visible (stellar) objects at the same 
redshift. 

Because of ALFA's superior angular resolution at L-band in comparison with that of the Parkes telescope 
(4' vs 15.5'), we are able to centroid the position of the ALFALFA HI sources to better than 20" on average 
and to identify their OCs likewise with better surety. Only 1013 of the 15855 sources presented in Table[T]do 
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not have assigned OCs. Of those, 814 blank field objects have observed velocities which fall within the range 
characteristic of emission associated with some Milky Way population. All of these are assigned an HI source 
category code of 9 in Column 12 of Table [TJ They are likely to be HVCs, although a few isolated objects 
with narrow velocity widt hs and small angular s izes (either barely resolved or unresolved) are candidate low 
mass extragalactic halos ([Giovanelli et alj|2010l ). Their distribution and nature will be discussed elsewhere. 



Of the remaining 199 HI sources (< 2% of the total extragalactic population) whose velocities suggest 
that they are truly extragalactic, we have individually examined closely the SDSS and/or DSS2(B) fields to 
look for OCs; comments derived from that examination are included in Table [5] Some of these objects do 
not lie in the region covered by SDSS, making the identification of OCs more difficult, but by design, only 
a few lie in regions of significant optical extinction. 

Roughly 3/4 of the "dark" HI sources are located in fields where objects of similar redshift are found, 
albeit beyond the reasonable limits of coincidence given by the ALFALFA poi nting accuracy. A num ber can 
be linked to p reviously known extended HI distributions such as t he Leo Ring (jStierwalt et al.ll2009l) , the tail 
of NGC 4254 faaynes. Giovanerli fc Kentll2007t Eent et al.ll2007h , the extended tail of NGC 4532/DDO 13 7 



ot inui; 4Zt>4 (riavncs. uiovanc in &; iientli^uu it li\ent et ai.lizuufl ). tne extended tan ot inuu ibSi/uuy 16 , 
|Koopmann et al.ll2008l ) or the intergroup gas found in the NGC 7448/7463/7464/7465 group (jHavneJ 198x1) 
Among the blank field HI detections with SDSS data (including DR8) and not contaminated by the presence 
of bright foreground stars, only about 50 remain as candidates to be isolated "dark" objects. These objects 
are the targets of a followup program that will confirm their reality as HI sources with the Arecibo single 
pixel L-band receiver, localize the HI emission via HI synthesis observations and search for associated low 
surface brightness stellar emission via optical imaging. 



4.4. OH Megamaser Candidates 



OH megam asers (OHMs) are powerful line sources associated with the starburst nuclei in merging 
galaxy systems. iBriggs I ( 19981 ) has pointed out that OH megamasers at z ~ 0.17 may contaminate a blind 
extragalactic survey such as ALFALFA. An extremely rare phenomeno n in the local universe, about 100 
OHMs are known out to a redshift of 0.265 ( Darling fc Giovanellil 120021 ) . The main 18 cm OH lines occur 
at rest frequencies of 1665 and 1667 MHz respectively. In OHMs, the emission at 1667.359 MHz dominates; 
that line is redshifted in the A LFALFA observing band fo r sources with 0.16 < z < 0.25. Using the large 
targeted survey for OHMs by (jDarling fc Giovanellil 120021 ) as a baseline for the expected flux density and 
spectral characteristics of OHMs, it is probable that a few of the ALFALFA sources without OCs may in 
fact be OHMs with 0.16 < z < 0.25. Confirmation that an ALFALFA source is in fact an appropriately 
redshifted OHM and not an optically dark HI galaxy will require follow-up HI synthesis observations to 
localize the line emitting region and optical/IR spectroscopy to confirm the redshift. 

Already, however, there are four OHM candidates which can be identified as such because the line 
emission occurs at frequencies higher than 1422 MHz. Hence, under the assumption that the emission 
arises from the HI 21cm line, the observed cz is too largely blueshifted for plausible interpretation as an 
extragalactic or Galactic HI source. The properties of these four objects are given in Table [4] and optical 
images obtained from either the SDSS or DSS2(B) are shown in Figure [8] The entries in Table 0] are as 
follows: 



• Col 1: Entry number in the AGC 

• Col 2: Centroid (J2000) of the emission line source, in hhmmss.sSddmmss, as in Col. 3 of Table [TJ 
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The designation of the candidate then adopts the identifier "OHMcand" plus this centroid position. 

• Col 3: Position of the identified optical counterpart, in hhmmss.sSddmmss. 

• Col 4: z opt , redshift of the optical counterpart, where known. 

• Col 5: zoh, redshift of the candidate OHM assuming its emission is dominated by the OH line at 
1667.359 MHz. 

• Col 6: CZ21, heliocentric velocity if the emission were associated with the HI line, in km s _1 . 

• Col 7: Foh, OH line flux density, in Jy km s _1 . 

• Col 8: S/N of the OH line emission, defined as in Col. 8 of Table [Q 

• Col 9: RMS noise in the vicinity of the line emission, defined as in Col. 9 of Table [T] 

In all four instances, there is a small object visible in public imaging databases which can be identified 
as the likely optical counterpart: 

• AGC 102708 = OHMcand000337.0+253215 is likely associated with SDSS J000336.02+253204.0, a 
very tiny object also evident in DSS2(B). There is no NED entry or redshift measurement. 

• AGC 102850 = OHMcand002958.8+305739 is likely associated with 2MASX J00295817+3058322, a 
well- formed spiral galaxy. There is no confirming redshift measurement. 

• AGC 181310 = OHMcand082311. 7+275157 is likely associated with SDSS J082312.61+275139.8, also 
known as IRAS 08201+2801 and 5C 07.206, a known ULIRG. For this single object, the optical redshift 
cz = 50314 km s -1 , z = 0.167830 ± 0.000041 from the SDSS confirms the identification as an OHM; 



its OHM emission was previously discovered by ([Darling fc Giovanellill2001l ). 



• AGC 228040 = OHMcandl24540.5+070337 is likely associated with SDSS J124545.66+070347.3, a 
spiral galaxy viewed at high inclination as evident in Figure [5] No confirming redshift measurement is 
available. 

OHMs may also be identified in the subset of low S/N sources not included in the current catalog 
because they do not meet the criteria of Codes 1 and 2. 

By the simplest argument based on the fraction of the usable ALFALFA bandwidth above 1422 MHz 
and assuming that these four candidates are, in fact, OHMs, it is possible that half of the "dark galaxy" 
candidates discussed in fc|4.3l might be OHMs at 0.175 < z < 0.245. A similar estimate co mes from considering 



the a. 40 volume and the OHM luminosity function at low z ([Darling fc Giovanellill2002l ). A more systematic 
approach to the identification of OHMs throughout the full bandpass and using the 3-D ALFALFA dataset 
is currently being undertaken by members of the ALFALFA collaboration. 



5. Validation of ALFALFA HI Parameters 



Most targeted extragalactic HI line flux densities are extracted from spectra conducted using a total 
power position switching technique. As outlined in <J2] the ALFALFA dataset is generated using a very 
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different approach whereby ALFA drift scan data are obtained months and sometimes years apart and 
without doppler tracking. The 2-D datasets (frequency versus time) for the two individual polarizations of 
each of the seven beams are bandpass subtracted and flagged for RFI. After the acquisition of all the drifts 
for a region of the sky, the 3-D spectra grid is then generated. As with any new survey, it is critical to verify 
that the spectral scales (velocity and flux density) at each grid point are accurate. 

The Cornell Digital HI archive presents a l arge compilation of dig ital HI line spectra obtained using 
pointed observations of optically-selected targets (ppringob et al.ll2005al) which have been digitally analyzed 
using similar algorithms to those adopted for ALFALFA. Because those spectra were obtained with a variety 
of single-dish telescopes and spectrometers, careful attention was paid to correct for instrumental effects 
such as pointing errors, source extent, instrumental broadening and spectral smoothing. Corrections for the 
various effects were modeled and tested to produce a homogeneous catalog of extracted properties with their 
associated error estimates. Here, we present the validation of the ALFALFA velocity, velocity width and 
flux density scale by comparison of a. 40 catalog parameters with the previous targeted HI line observations 
of sources which have been re-detected by ALFALFA. Of the 2073 galaxies which are contained both in the 
Springob et al.l (|2005al ) and the a. 40 catalogs, 1887 are classified as ALFALFA Code 1 sources and 186 are 
Code 2 detections. 



5.1. Validation of the ALFALFA Velocity Scale 

The ALFALFA "minimum- intrusion" observing mode acquires data without doppler tracking, i.e., in 
topocentric mode. Heliocentric corrections are applied in the Fourier domain, whereby the appropriate 
velocity shift at each point (each spectrum associated with each one second record for each polarization 
of each beam) is calculated, converted to a phase gradient across the bandpass and applied to the Fourier 
transform of each spectrum. The inverse Fourier transform then gives the spectrum in the heliocentric rest 
frame which is used thereafter to yield the systemic velocity cz and the HI profile velocity width W^q. 

It is important to note that the specific definitions of HI systemic velocity and the global profile velocity 
wi dth are not uniformly a dopted in the literature. For ALFALFA, we adopt the same convention as that used 



bv lSpringob et al.l (|2005al) . that is, polynomials are fit to each side of the two- horned profile and then cz and 
VF50 are measured at the level of 50% of the peak intensity on either horn: cz is then the midpoint and W50 
the full width at that level. Where appropriate (face-on galaxies; dwarf systems), a single Gaussian provides 
the best fit and is s i milarly measured. Figure |H] illustrates the comparison of the two parameters for the 



q.40- ISpringob et al.l (|2005a) HI archive overlap sample. In both panels, the vertical axis shows the residual 
ALFALFA-HI archive. The occurrence of outliers is expected because (a) ALFALFA spectra correspond only 
to 40 sec per beam of integration time on source, whereas the targeted spectra are generally of much longer 
integration; (b) targeted spectra are affected by pointing errors either in the coordinates used to position the 
telescope or intrinsic telescope pointing inaccuracy; and (c) blends with close companions where the pointed 
spectra were taken with smaller single dish telescopes. A few cases of gross disagreement are explained by 
errors in the velocity scales of very old HI data which were acquired in the days before significant information 
was written into data headers, when the setup of the backend electronics and spectrometer required physical 
cabling and hand dial-setting at the start of each observing run and when records of frequency offsets for 
different quadrants of the spectrometer were kept only on hand-written index cards; these cases are noted in 
the comments in Table [2] The appearance in the lower panel of some outliers at relatively low W50 reminds 
us that at low S/N or in the presence of residual baseline structure, broad widths may be underestimated. 
The dependence of the sensitivity on line width is discussed in $6] As evident in Figure [9j the comparison 
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of the velocity scales reveals no systematic offsets and agreement within the expected errors. 



5.2. Validation of the ALFALFA Flux Density Scale 



As discussed in Ivan Zee et al.l (|1997l ) , practical limitations and instrumental uncertainties restrict the 
accuracy with which HI line flux densities can be measured to not better than a few percent. Despite regular 
calibration via the injection of a noise diode, drifts in the electronic gain, amplifier instabilities, sidelobe 
variations and standing waves (caused by multiple reflections within the optical path of cosmic continuum 
sources or terrestrial RFI) induce variations in the total power, while baseline irr egularities and data los s 
due to RFI impact the measurement of flux density in noisy data. As discussed in ISpringob et al.l (|2005ah . 
HI line flux densities derived from targeted (pointed) observations are typically accurate to not better than 
15%; older datasets taken when amplifiers were substantially less stable than today are probably accurate 
to not better than 25%. 

Calibration of the ALFALFA dataset is performed in two separate stages. First, during the course 
of an observing run, a noise diode, calibrated by the engineering staff in the lab, is fired once every 600 
seconds. The data stream then includes a record with this additional power source (the "cal-on" record). All 
observing runs contain at least 9 such calibrations; longer ones may contain as many as 60. A polynomial 
fit is performed to the ratio of the total power with the calibration diode on, versus when it is off, for the 
whole set for an observing block. This polynomial is then used to correct the individual records of the 
drift scan data. The second method of calibration is performed on the data after grid construction, making 
use of the radio continuum sources which they contain. A comp arison is made of th e flux densities of the 
source contained in the grids with published values in the NVSS ( Condon et al. 1998h . and then an average 
correction factor is applied to t ie the ALFALFA flux scal e to the NVSS. Further details on calibration of the 
ALFALFA dataset are given in Kent fc Giovanellil (|201lh . 



Even when gain corrections for frequency dependence and other effects are correctly calibrated out, HI 
line flux densities observed with single point observations must be corrected for beam dilution and, often, 
pointing errors. In addition to the inaccuracy of telescope pointing, particularly important in early Arecibo 
observations, the input positions used to point the telescope were accurate only a t the level of 0.5-1' leve l 



for some of the oldest observations used to acquire the archival data reanalyzed by ISpringob et al.l (|2005a 



Springob et al.l (|2005al ) report both raw (as observed) and corrected values of the HI line flux density 
for galaxies observed via pointed observations of optically-selected targets. The true HI line flux density 
was derived by applying corrections for telescope pointing errors, errors in the positions used to point the 
telescope (both of which apply more importantly to older datasets) and for partial resolution by the telescope 
beam. The latter is derived by adopting a hybrid correction for source extent that is based on a modeled HI 
distribution scaled by the optical size and an average telescope beam power pattern. As a drift scan mapping 
survey, ALFALFA flux densities are not subject to such corrections. Figure ITOl shows the compa rison of the 
HI line flux densities measured by ALFALFA with the values given in the lSpringob et ah ( 2005a ) HI archive. 
The latter are corrected for pointing and position errors and for source extent (but not for internal HI 
absorption). The vertical axis shows the ratio of the HI line flux densities reported in the two catalogs. 
ALFALFA detection Code 1 objects are shown as blue open circles; the lower S/N Code 2 objects are shown 
in red filled circle. Since the error in the HI line flux density for both surveys depends on the HI line flux 
density itself as well as the S/N of the spectrum and the magnitude of the corrections applied to the pointed 
data, the increasing scatter seen in the ratio at low HI line flux densities is as expected. When corrections 
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for source extent are applied to the pointed data, the flux density scales are coincident within the errors. 



Among the sources with highest HI line flux density, Figure[in]suggests that the ALFALFA flux algorithm 
may be missing some flux. The total flux should be recovered since ALFALFA is a mapping survey but the 
HI line flux from very extended sources, especially those located towards the edges of the constructed grids 
could be lost due to the finite grid size and the bandpass subtraction and grid flattening schemes. For those, 
alternative processing tools from the standard pipeline will be developed, after completion of the main survey. 

In order t o asse ss th e contribution of ve ry diffuse, extended HI in vicinity of nearby, isolated galaxies, 
Havnes et al.l ([19981 ) and iHogg et al.l (|2007j) observed a carefully- selected sample of ^100 galaxies with 
both the former 42 m and the Green Bank Telescopes. As they note, the uncertainty in the HI line flux 
density for their high signal-to-noise data is < 1%; on the other hand, the uncertainties contributed by 
fitting the polynomial baseline and defining the boundaries of the emission profile are considerably larger. 
Because an unblocked aperture should deliver reduced standing waves and minimal stray radiation, flux 
densities measured with the GBT should be more accurate than ones measured with a complex instrument 
like Arecibo. At this point, there are only 12 galaxies in common with the sample observed very accurately 
with the former 42 m telescope by Haynes et al. (1998), not enough for conclusive results. These issues will 
be explored in a future work. 

Although it might be expected to serve as the better dataset to use in examining syst ematic uncertainty 
and testing for missing flux from extended/bright sources, the northern HIPASS survey (jWong et al.l 120061 ) 
does not in practice provide an adequate comparison sample for several reasons. First, as mentioned above, 
flux calibration uncertainties do not dominate most HI flux density errors; baseline uncertainties, noise and 
beam effects do. A drawback of the northern HIPASS catalog is that som e of it suffers from residual baseline 
ripple, particularly when observations were made during the daytime ( Wong et al. 20061 ) . Secondly, the 



sensitiv ity difference between Arecibo and Parkes means that the S/N of most ALFALFA / lSpringob et al 



(|2005al ) detecti ons is typically much h igher than that of HIPASS. This fact alone, on top of the baseline 
issues, give the lSpringob et al.1 (|2005al ) spectra a significant advantage over HIPASS i n terms of parame ter 
accuracy. Thirdly, although there are 1000 galaxies in the northern HIPASS dataset (jWong et al.l 120061 ) at 
Dec. > 2, only ^350 lie in the overlap with the present a. 40 catalog. Lastly, there are no Code 2 sources 
detected by HIPASS (which is not surprising, given its much poorer sensitivity) so we cannot make the 
comparison between Codes 1 and 2. However, for the record, we include in the lower panel of Figure [10] 
the similar comparison of flux densities from ALFALFA and HIPASS; clear cases of confusion within the 
Parkes beam are not included in this analysis. Curiously, ALFALFA detects more flux density than HIPASS 
in some cases; examination of those reveals that they are mainly ALFALFA detections with broad W50 for 
which HIPASS detects, at much lower S/N, a lower HI line flux density and a narrower W50, clearly missing 
some of the HI line emission detected by ALFALFA. 



6. ALFALFA Source Completeness and Reliability 

The practical exploitation of any survey requires an understanding of its source sensitivity, completeness 
and reliability. In comparison with previous blind HI surveys, ALFALFA offers a much richer dataset which 
itself can be used to probe the robustness of its source catalog. 

Source extraction and parameter measurement for ALFALFA is performed in a two-step process, which 
includes automated as well as in teractive procedu res. Initial source extraction is performed by a fully auto- 
matic matched-filtering method (jSaintong J2007al lbh. The algorithm uses templates which vary in shape as a 
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function of profile width (gaussian for narrow profiles, Hermite functions for broad profiles), and outperforms 
algorithms based on smoothing followed by peakfinding (see Figure 3 in ISaintongel ( 2007a )). The reliability 
(i.e. the fraction of detections that correspond to real sources) of this automated method is estimated to be 
~ 95% for sources with S/N extractor > 6.5, a value that was determined by performing source extraction on 
regions of the ALFALFA datacubes expected to be devoid of cosmic signals (corresponding to the velocity 
range —2000 < cz < —500 km s _1 , see §5.4 and Figure 8 in ISaintongel (|2007af )). Source candidates are then 
visually inspected and source parameters are interactively measured and catalogued. It should be noted that 
the parameters of the final catalogued sources (e.g. S21, S/N, W50, etc.) do not generally coincide with the 
values determined by the automatic signal extra ctor, because the two procedures use different definitions 
and calculation methodologies for the parameters (jGiovanelli et al.ll2007l ). and the human intervention is de- 
signed to optimize the measurement accuracy and further improve the reliability of the catalog by rejecting 
spurious detections that correspond to low-level RFI, poorly sampled data and residual baseline fluctuations. 
We therefore expect the final reliability of ALFALFA Code 1 detections to be very close to 100%. 



However, as discussed by ISaintongel (|2007al) . the reliability of ALFALFA sources extracted by the 
matched-filtering algorithm drops precipitously below a S/N of 6.5. The Code 2 sources included in Ta- 
ble [1] fall below this nominal ALFALFA S /N detection threshold, but are included in the catalog because 
they coincide with an optical galaxy of known (prior) and coincident redshift. Although these sources should 
not be included in statistical studies which require careful consideration of survey completeness and sensi- 
tivity limits, the vast majority are likely real HI line sources, and the gas in them will also contribute to 
the overall HI density in the local universe. Hence, we include them in the following analysis of ALFALFA 
completeness and their impact on measurements of cosmological parameters. 

The two step process used to identify, extract and measure the ALFALFA detections presented in Table 
[1] results in a catalog of reliable detections that is dependent on both the integrated HI line flux density of a 
given source and its global HI line profile width, W50. Like all fixed integration-time spectroscopic surveys, 
ALFALFA is more sensitive to narrow HI profiles than to broader ones of the same integrated line flux. Based 
on the demonstrated performance during the single - pass pr ecursor observations of the observing equipment 
and the signal extraction pipeline, iGiovanelli et al.l (|2005bh predicted the specific relationship between the 
integrated flux density detection threshold (821,47, in Jy km s _1 ) and the profile width (W50, in km s _1 ) of 
a source in terms of the S/N required for inclusion in the catalog: 



S 



IX.th — 



(3) 



0.15 S/N (VFso/200) 1 / 2 , W 50 < 200 
0.15 S/N (VF 50 /200), W 50 > 200 

Note that the above expression differs from that given in Equation 2 of Giovanelli et al. ( 2005bl) (numerical 
factor of 0.15 vs. 0.22) because that work adopts the rms appropriate to the single -drift maps used in the 
precursor observations. One of the principal conclusions of IGiovanelli et al.l (|2005bT ) was that the two-pass 
strategy adopted for ALFALFA would improve on that employed by the precursor program by a factor of 
1.5. 

"Sensitivity" is a qualitative term that can be defined in terms of the survey "completeness" . We refer 
to the completeness of the ALFALFA survey as that fraction of cosmic sources of a given integrated flux 
density and within the survey solid angle that are detected by ALFALFA and included in the a. 40 catalog. 
Other blind HI surveys (e.g. ADBS, HIPASS) have estimated the completeness of their catalogs as a function 
of profile width (VF50) by examining their ability to recover synthetic sources of known characteristics (peak 
flux, S/N, W50, etc.) injected into the spectral cubes. One of the motivations for such an appro ach is to 
assess the reliability of sources in the presence of non-Gaussian noise. As noted bv lSaintongd (|2007al ) (see §5.4 
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and Figure 8 of that paper), the impact of non-Gaussian noise on the automatic signal extractor developed 
for ALFALFA is generally minimal above S/N = 6.5. Its presence, principally in the for m of the very broa d 



spectral standing waves resulting from reflections in the telescope focal structure (e.g., iBriggs et alj |1997). 
is responsible for the upturn at large W50 in Equation [3] At the narrower widths, there is no evidence that 
a Gaussian assumption is unfair. 

Now that a significant ALFALFA dataset exists, the data themselves can be used to derive the true 
sensitivity limits. The analysis of the real survey data is motivated both by a desire to use the real observables 
rather than predictions of the performance of the observing equipment and signal extraction pipeline, and 
especially by the fact that the ALFA LFA survey has actually outperformed its predictions, as discussed 



in Appendix A of iMartin et al.l (|2010l ). Hence, we follow a different method to determine the ALFALFA 
completeness that makes no use of "fake sources", but relies instead on the a AO catalog itself. For a flux- 
limited sample from a uniformly distributed population, number counts will follow a power-law with an 
exponent of —3/2. We then can determine the onset of incompleteness when our data deviates from this 
form. Briefly, the details of this method consist of the following steps: 

1. The Code 1 sources are divided into 32 equally spaced bins in logWso- 

2. For each width bin, we count the number N of detected sources in logarithmic intervals of flux density 
to determine the dN / dlog S21 histogram; apart from the impact of large scale structure in the survey 
volume, number counts are expected to follow a power-law with an exponent of —3/2. 

3. For each bin in logWso, we plot S^ 2 dN / dlog S21 versus 521; see Figure QT] for three representative 
width bins. This distribution should be flat if all sources are accounted for. A downturn at low S21 
thus marks the onset of incompleteness. 

4. We fit an error function to each histogram (red dashed lines) and assume completeness over the well 
sampled range of S21 over which the distribution shows a flat plateau. 

5. We calculate the integrated flux density where the ALFALFA completeness crosses 90%, 50%, and 25% 
(vertical red lines mark the 90% completeness in each bin). In practice, the distributions drop off in 
the same in the same way, such that the 50% and 25% limits occur at a constant offset in log S21 from 
the 90% value across all bins. 

6. The values of 5*21,90% f° r each W50 bin are then fit with the combinaton of two straight lines, similar 
to Equation |H with a break at W50 = 300 km s _1 . 

The resulting 90% completeness limit (red solid line in upper panel of Figure [T2l for Code 1 sources can be 
expressed as: 

_ f 0.51ogW 50 -1.14, logW 50 <2.5 
log 6 ai ,909s > c o «Jei - I lQg W5Q _ 2 39) bg W5q > 2 5 (4) 

where S21 is in Jy km s _1 and W50 is in km s . As mentioned before, the 50% and 25% completeness limits 
occur at a constant offset from the 90% value. The derived offsets for the Code 1 sources only are: 

log 521,50%, Codel = \ogS 2 l,90%,Codel ~ 0.067 ,g. 
log5 2 l,25%,Codel = log 521,90%, Code! ~ 0.102. 



Of the 15041 extragalactic objects in the a. 40 sample, 3100 are categorized as Code 2 detections (low 
signal-to-noise detections with prior optical detection). The lower panel of Figure [T2l shows the corresponding 
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plot of the distribution of sources in the log W50 — log #21 plane for the a. 40 extragalactic catalog, including 
the Code 2 detections which are shown as green symbols. These additional HI sources are expected to have a 
lower detection threshold, clearly evident in the lower panel of Figure [T^l An analysis identical to the above 
can be performed including both the Code 1 and 2 sources, yielding a relation for the combined catalog (red 
solid, dash-dotted and dotted lines in the bottom panel of Figure [12]): 



l°g #21,90% 



0.51ogW 5 o-l-ll, 
log^o-2.36, 



log W 50 < 2.5 
logW 50 > 2.5 



(G) 



and 



log #21, 50% = log £21,90% - 0.130 

log #21,25% = log #21,90% - 0.202. 



(7) 



Excluding the Code 2 sources from the HIMF analysis as did iMartin et al.l (|2010h guarantees that more 
confident detections with well-understood selection criteria are used. It could be argued that the use of 
sources of Code 2 in the analysis could provide value added to the determination of the HIMF. This is 
discussed further in <J7Tj In practice, statistical studies requiring stringent requirements on sensitivity limits 
should use only Code 1 sources and Equation 2] With the proper caution associated with the incomplete 
nature of Code 2 sources, the combination of Code 1 and Code 2 sources and Equation [6] can be used in 
studies which can benefit from a larger sample. 

In both cases, the 50% completeness limit can be considered the "sensitivity limit" of the survey, since 
it is the most relevant completene ss limit for the derivation of g alaxy statistical distributions, such as the 
HIMF and the HI width function. iRosenberg &: Schneider! ( 20021 ) have shown that adopting a step function 



cut at the 50% completeness limit of a survey produces approximately the same statistical results as adopting 
the survey's full completeness function. The 25% completeness limit can be identified with the "detection 
limit" of the survey, that is the integrated flux density level below which a source has only a small chance of 
being detected and cataloged. 

We remind the reader that the quoted limits given here refer to the full a. 40 catalog, and hence are 
representative of the average ALFALFA datacube noise properties. However, because of variations in noise 
among and within grids and because some localized regions are entirely contaminated by RFI, limits on the 
HI flux density at arbitrary positions (e.g. upper limits for non-detections) must be computed individually, 
by specific inspection of the spectrum noise properties of the data cubes and their associated "weights grid" 
and the continuum maps. It is t he availability of su ch ancillary information which enables the use of the full 
ALFALFA dataset for stacking (jFabello et al.ll2011l ) to probe statistical ensembles more deeply. 

As the previous generation blind HI survey, HIPASS dMever et al.ll2004h set the standard for survey com- 
pleteness; by design, ALFALFA was intended to surpass and supercede HIPASS. A reasonable comparison of 
the impact of the different source detection schemes (including the absolute level of flux density sensitivity) 
may be made by comparing the distribution of the highest-mass galaxies in HIPASS and ALFAL FA. For ex- 



ampl e, one might have anticipated that the original HIPASS peak-flux density detection scheme ([Meyer et al 



20041 ) could bias the catalog against edge-on (ext remely wide) profiles at the highest masses, and such a bias 
could explain the finding of Martin et al. ( 2010l) that the HIPASS HIMF underestimated the number den- 
sity of the highest mass galaxies. Figure [13] shows a comparison of the distribution of profile widths in 
a. 40 (open histogram) and HIPASS (filled histogram), for objects with log Mhi/Mq > 10.0. No obvious 
difference which would explain a lack of high-mass sources in the HIPASS catalog is apparent. While the 
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peak-flux density threshold detection could introduce such a bias, it is apparent that the matched filtering 
technique subsequently applied to the HIPASS dataset recovers high-width objects as does the technique 
used in ALFALFA. Instead, we attribute the lack of extremely high-mass sources in the HIPASS catalog 
to that survey's limited redshift extent and its lowered sensitivity near its bandpass redshift limit, both of 
which resulted in inadequate sampled volume and thus an undercounting of the rare, highest mass HI disks. 

Furthermore, because of its lower sensitivity, poorer angular and spectral resolution and source detection 
scheme, HIPASS was limited in its ability to probe the very low-mass and narrow-width HI sources. The 
spectrometer setup employed by HIPASS yielded a raw resolution of 13.2 km s _1 and of 18 km s _1 after 
Hanning smoothing; the narrowest objects included in the HIPASS catalog have W50 = 30 km s _1 . In 
contrast, ALFALFA'S velocity resolution is 11 km s" 1 after smoothing is applied, and the a. 40 catalog thus 
in cludes sources with extremely narrow velocity widths. Although the signal extraction algorithm adopted 
by Saintongel ( 2007a ) applied a minimum template width of 30 km s _1 , the refined final process of parameter 



extraction based on individual examination of the emission region permits finer width estimation. In fact, 289 
of the extragalactic objects included in Table[T]have W50 < 30 km s _1 . Figure [Ml examines the distributions 
of low HI mass systems and their profile velocity widths in the two surveys; ALFALFA is clearly superior 
in its ability to probe the lowest mass systems. This increased sensitivity to very narrow HI line emission 
enhances ALFALFA'S ability to probe the lowest HI masses, which in turn robustly constrains the faint-end 
slope of the HI mass function, a. In fact, at the lowest HI masses, log Mhi/Mq < 8.0, the HIPASS catalog 
includes only 40 objects while the a. 40 catalog contains 339. The ability of ALFALFA to sample narrower 
HI line sources is also critical for the derivation of the HI width function and its relation to the halo mass 



function (jPapastergis et al.ll2011l ). 



7. The Impact of ALFALFA Survey Characteristics on Derivation of the HIMF 



In drawing conclusions from blind HI surveys about the Hl-selected population in the local universe, 
it is critical to understand the biases in the survey due to its sensitivity limits, uncertainties in the HI line 
flux densities and distances leadi ng to uncerta i nties in the derived HI masses, and the impact of large-scale 
structure in the survey volume. iToribio et al.1 (|2011bl ) use a subsample of ALFALFA HI sources identified 
in low density environments to establish a standard of normal HI content an d performed an anal ysis of the 
completeness of the particular version of the ALFALFA catalog they used. iMartin et al.l ([20101 ) (see also 
Martin 2011) provided an overview of important effects that impact the derivation of the HIMF by two 
different methods commonly used to derive mass and luminosity functions, namely the 1/Y rnax method and 
the two-dimensional stepwise maximum likelihood (2DSWML) method. In the context of applications such 
as the derivation of the HIMF by those two methods, we discuss here in greater detail the magnitude and 
character of a. 40 survey properties, its limitations and biases. It is particularly important to understand 
these effects now because we anticipate the '100% ALFALFA survey' to be available in the near future. The 
large increase in the number of galaxies available for that analysis will decrease the statistical uncertainties 
on the measurements, thus amplifying the relative impact of systematics and biases. Additionally, at that 
stage it will be less practical to create thousands of realizations to help understand the various effects. The 
results presented in this Section will provide a baseline and dictate procedure for the final measurement of 
the HI mass function from the completed ALFALFA survey. 
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7.1. The Limits of ALFALFA: Code 2 "Prior" Sources and the RFI-imposed Redshift Cutoff 

Selection effects related to the Code 2 sources in a. 40 are poorly determined. Because they require 
redshift information derived from other sources, they are subject to the limitations of the availability of such 
confirming data. Additionally, ALFALFA'S sensitivity as a function of distance is strongly affected by RFI 
especially in the frequency range contaminated by the avi ation radar at the S an Juan airport (1350 MHz, 



corresponding to cz ~ 15600 km s _1 ). For these reasons. Martin et al.l (|2010i ) included only objects with 
Code 1, detected within 15000 km s _1 . Yet it may be argued that the additional information contained in 
Code 2 sources, dipping to lower flux limits, could provide additional insight. A first evaluation of the value 
added to the HIMF by Code 2 sources relates to the observation that most Code 2 sources fall near M^ x , 
a region of the HIMF well sampled by Code 1 sources: the value added is likely thus to be negligible. We 
explore numerically this expectation. 



7.1.1. The Code 2 Sources 

Because of the requirement that Code 2 sources be identified with an OC of known (prior) redshift, 
most often contributed by optical/IR surveys like SDSS, those sources may be biased toward overdensities, 
toward those regions of the local volume that have been included in specific targeted or wide-area redshift 
surveys, such as the Virgo Cluster, and in particular toward those regions of the sky that have been covered 
in the spectroscopic catalogs of the SDSS. 

Does the inferred HIMF change if Code 2 sources are included in its derivation? We account for HI 
mass and flux density errors by creating 500 realizations of an HIMF that includes so urces of both Code 1 
and Code 2, and compare those to 100 realizations of the fiducial HIMF published in Martin et al. ( 2010l ) 



whic h contained only th e Code 1 objects. We use the 2DSWML method, but do not jackknife resample. As 
did iMartin et al. ( 201ol ). we restrict the analysis to the contiguous areas contained in a. 40 and limited to 



cz < 15000 km s 1 . Over the same volume, the inclusion of Co de 2 sources increases the sample size used 
for this analysis from the 10,021 included in IMartin et al. ( 2010h to 11,177. 



Figure [TS] displays the HI mass function found when Code 2 sources are included in the analysis. The 
parameters of the function are not strongly affected by the inclusion of these sources. We find (hj 
Mpc-Mex- 1 ) = 4.8 ± .3 x 10~ 3 , log(M*/M ) + 2 log h 70 = 9.96 ± 0.02 and a = -1.29 ± 0.02. These 
correspond to £Ihi — 4.1 ± 0.3 xl0~ 4 h^ 1 found by integrating the Schechter function, or Vtui — 4.2 ± 0.1 
x 10 -4 hy, 1 when summi ng the binned measu rements directly. The fiducial HIMF which includes only Code 



1 objects as reported by IMartin et al.l (|2010D finds 0* = 4.8 ± 0.3, log(M*/M ) = 9.96 ± 0.02, a = -1.33 ± 
0.02, VLhi (analytical) = 4.3 ± 0.3, and VIri (summed) = 4.4 ± 0.1, all in the same units as expressed for 
the results with both Code 1 and 2 sources. 

Encouragingly, these results indicate that the ALFALFA survey's detection coding scheme does not 
systematically exclude significant sources of HI gas energy density in the local universe. Rather, the agree- 
ment between the Code 1 and the Code 1+2 HIMFs suggest that our robust understanding of the survey's 
sensitivity extends to those weaker sources identified as Code 2 objects. 

The only potentially significant impact is on the faint-end of the HIMF, influencing both the slope 
and the points there. The slope parameter a is flattened in the Code 1+2 case, though the two values are 
just barely within la of each other. In Figure [16J we compare the residuals (the best-fit, fiducial HIMF 
Schechter model, subtracted from the binned data) for the case where we consider only Code 1 objects (top 
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panel) and the Code 1+2 case (bottom panel). The figure clearly demonstrates that the Code 1+2 HIMF 
measured fewer low-mass objects per unit volume, thereby yielding a flatter slope. This is unsurprising for, 
in comparison to HI surveys, optical surveys undersample dwarf, low surface brightness galaxies. The typical 
Code 2 detection is a galaxy near L* , and the redshift distribution of Code 2 sources lacks the smattering 
of low redshift objects present in a Hl-selected sample. As a result, Code 2s add very few additional sources 
at the lowest redshifts, at which low HI masses are detectable. This is an example of the fact that adding 
Code 2 sources to the sample is more likely to subtract than to add value to the result: "more is less" . 

We note that in the process of source extraction, a second set of marginal HI line detections has been 
identified which coincide with possible OCs for which no redshift measurement is available. Because the 
probability of these objects is yet too uncertain, they are not included in the current a. 40 catalog. Future 
followup observations to be made after the main survey is completed will be undertaken to confirm the 
reality of the HI line detections. This program will contribute additional low HI line flux density sources to 
the final ALFALFA catalog in this region of the sky. 



7.1.2. The Full Redshift Extent of the ALFALFA Survey 



(Unfortunately) we live on a planet occupied by technologically-active humans. Figure 6 of lMartin et al 



(|2010h illustrates the relative spectral weight within the 40% ALFALFA survey volume as a function of 
observed heliocentric velocity. A relative weight of 1.0 indicates that the entire surveyed volume was accessible 
for source extraction and produced high-quality data. As also evident in the deficit of sources near a 
distance of ^225 Mpc seen in Figure |3l the FAA radar at the San Juan Airport contaminates the frequencies 
corresponding to source at cz between 15000 and 16000 km s~ x , rendering the detection of sources in this 
range impossible when the transmitter is on. Beyond 16,000 km s" 1 , ALFALFA'S sensitivity recovers, but 
at the corresponding distance, it is sensitive only to the most massive of galaxies. As a result, this distant 
volume contributes only a small number of galaxies to the overall a. 40 sample. 

For these reasons, the analysis of the HIMF in iMartin et al.l ( 2010h neglected galaxies beyond 15000 



km s _1 , so that the results would not be influenced by the large spectral weight gap. This exclusion was 
especially important in the case of the 2DSWML method, since the 1/V m ax method allows the inclusion of 
explicit corrections for known missing volumes. 2DSWML, by contrast, determines the shape of the HIMF 
by comparing counts in HI mass bins to a built-in description of ALFALFA'S flux density sensitivity as a 
function of distance and width. The large gap, which is not anticipated b y this approach, may have caused 
problems in the analysis were those objects included. Martin et al. ( 2010l ) felt it was safer to limit the first 



measurement of the HIMF to regions where the spectral weights are relatively smooth, that is, to galaxies 
within 15000 km s _1 . Here, we revisit the issue and consider the influence, if any, of including the full 
redshift extent of the survey in the HIMF analysis. 

In particular, we would expect that the increased bin counts at the very highest HI masses may in- 
crease the statistica l significance of the HIMF measurement there. Such a possibility is of interest because 
Martin et al. d2010h determined that ALFALFA is more sensitive to high-mass galaxies than HIPASS was, 



with HIMF results indicating that previous blind HI surveys have missed a significant percentage of the most 
massive HI disks. To test this possibility, we calculated the HIMF using both methods and using all of the 
Code 1 sources out to 18,000 km s _1 . For each method, we created 250 realizations of the survey to account 
for flux density distance, and mass errors. The fit parameters and FIhi values are displayed in Table [5] for 
both the l/Vmax and 2DSWML methods. It is worth noting that the 2DSWML result is distorted, likely 
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because of the influence of the inaccessible volume and the inability of this method to correct for it. In fact, 
the 2DSWML result drastically underestimates £Ihi, shifts log (M*/M Q ) to a higher value, and flattens out 
the low-mass slope a. On the other hand, the l/V moa; method continues to function as expected and results 
in a reliable measurement. Both of the l/V mQX results are consistent with each other, including the measured 
values for SIhi, but the 2DSWML method in the presence of the redshift gap performs poorly. This result 
confirms the decision by Martin et al. ( 2010l) to limit their analysis to the volume within cz < 15000 km s _1 . 



This analysis provides further evidence of the relative strengths and weaknesses of the two available 
methods for estimation of the HIMF. While the 2DSWML approach provides a powerful statistical tool, it 
functions as a 'black box' method that cannot be manipulated by additional knowledge of the survey. In 
some cases, this may be an advantage, but in the case of ALFALFA where we have detailed information 
about the survey volume, the survey sensitivity, and other factors contributing to the HIMF, the l/V mQX 
method provides a clearer path and a more understandable answer. 



7.2. Uncertainties in the HI Mass 



On the low HI mass end, uncertainties in the conversion from HI line flux density to HI mass are 
the primary source of error on the HIMF. Unlike the p ractice in the deriva tion of most HIMF results in 
the literature, the error analysis undertaken here and by iMartin et al.1 (|2010l ) has taken this explicitly into 
account. Because the HIMF is based on binning galaxies by HI mass and then considering each bin as 
an independent data point, it is not straightforward to carry HI mass uncertainties through analytically. 
Instead, the ALFALFA HIMF's uncertainties due to mass errors are calculated through the creation of many 
hundreds of realizations, each with randomly assigned mass (i.e., distance and flux density) errors. Here, we 
elaborate further on the distance estimate scheme used in ALFALFA, the biases that would be introduced 
by using alternative schemes (i.e., a pure Hubble flow model) and the overall impact of distance and flux 
density errors on the HI mass estimates used to construct the HIMF. 

l|2010t) and 



The distance estimation scheme adopted for ALFALFA was described by IMartin et al 



is 



summarized briefly here. When distances are based on the adopted flow model, we employ the model's error 
estimates, constrained by the fit of the model to the observed velocity fit. When distances are estimated 
using pure Hubble flow, the error is estimate d to be ~10%. We fix a minimum error of 163 km s _1 , based 



on the local velocity dispersion measured bv Masters! (|2005l ). To demonstrate the importance of using the 
full suite of available information to estimate distances, Figure [17] compares the primary distances (used in 
a. 40) to the values that would be obtained assuming pure Hubble flow. 

In their estimate of galaxy masses for the HIMF, the HIPASS team assumed Hubble flow. This is 
not a safe assumption, particularly in the regions of the sky surveyed in a. 40. The Virgo Cluster, in 
particula r, represents a stron g deviation from any assumed relationship between distance and recessional 
velocity. Masters et al.1 (|20041 ) showed the danger of assuming pure Hubble flow, especially because of the 
small redshifts accessible to blind HI surveys. Those authors concluded that the low-mass slope of the 
HIPASS HIMF was underestimated due to neglecting peculiar velocities, and predicted that a survey in the 
direction of Virgo could severely underestimate the low-mass slope. 

Given the large-scale structure in the a. 40 volume, we would expect the HIMF to vary strongly if a poor 
choice of distance estimate were made. In order to test this, we have re-calculated the 2DSWML estimate 
of the HIMF using pure Hubble flow to estimate distances. That is, we converted the observed heliocentric 
velocities into the CMB rest frame, and then assumed Dm P c = cZcmb/Ho, where we adopt the ALFALFA 
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standard Hq = 70 km s Mpc . In this case, we have no ideal estimate of the distance error, and therefore 
use 10% of the Hubble flow distance or the local dispersion value 163 km s" 1 , whichever is greater. As usual, 
flux density errors are also folded into the mass uncertainties. Once again, we create 250 realizations to 
estimate uncertainties. 

The resulting HI mass function and Schechter fit parameters are displayed in Figure [TBI As anticipated 



(jMasters et al.l 120041) , the use of Hubble flow has caused a serious underestimate of the faint-end slope 
a. ALFALFA'S success at robustly measuring the HIMF depends not only on large sample size over a 
cosmologically significant volume, but also on the selection of a reasonable model for distance estimation. 

Given the discu ssion of distance errors and their large impact on the HIMF and its uncertainties by 



Masters et al.1 ([20041 ) . it is reasonable to ask how large are the HI mass errors when both distance and flux 
density errors in the a. 40 sample are taken into account. To obtain robust estimates of HI mass errors, we 
created many thousand realizations of each galaxy in the a. 40 sample and applied distance and flux density 
errors. The result, displayed in Figure [T9l compares the HIMF mass bin galaxies would nominally fall into 
assuming a perfect measurement of distance and flux density (along the abscissa) to the mean mass of the 
galaxies assigned to that bin once realistic uncertainties are taken into account. The horizontal uncertainties 
indicate the la spread of potential 'true' masses falling into nominally assigned mass bins. From the Figure, 
it is clear that ALFALFA'S measurement of the HIMF and Hhi is not prone to large uncertainties above 
1O 8 O M0. In the mass range of interest to the missing satellites problem, dwarf galaxy studies, and the 
low HI mass slope of the HIMF, that is below 10 8 Af Q , galaxies can easily be mis-assigned to bins, even 
when a realistic distance model is being used. Depending on the large-scale structure in the survey volume, 
this effect would lead to either an over- or under-estimate of a. We therefore take great care to account, 
conservatively, for mass uncertainties. 



7.3. The Impact of Large Scale Structure in a. 40 

Because blind HI surveys are relatively shallow, with ALFALFA probing the local universe only out to z 
~ 0.06, inhomogeneity in the survey volume has a strong impact on the derived HI mass function. This effect 
is particularly true in the case of the l/V mQa; method, which is not as robust against large-scale structure, 
but the 2DSWML method is not completely immune fr om these effect s. To test the homogeneity of a sample, 
the usual statistical test applied is the V/Y max test ( Schmidt 19681 ). Much like the 1/V max method, this 



test considers the maximum volume out to which each source in a survey can be detected. By comparing 
the actual volume the source was detected in to the accessible volume, homogeneity in the sample can be 
evaluated; the expectation value < V/V max > is 0.5 in a homogeneous volume. 

In the case of the a. 40 volume, < V/V max > — 0.45. This indicates that, at 40% completion, the survey 
does not yet contain enough volume to fully 'smooth out' the effects of large-scale structure. This is reflected 
in Figure [20] where < V/V max > is shown for each bin of HI mass. The most obvious feature in this Figure, 
the dip near log (M#j/Mq) ~ 8.4, is due to overdensities in the sample volume, primarily the Virgo cluster. 
Galaxies in those overdensities are found, preferentially, in those regions, rather than filling the full volume 
where ALFALFA'S sensitivity could detect them, causing this dip. 

It is clear that a. 40 does not, yet, constitute a representative slice of the universe; as the survey 
progresses, we anticipate that the full sample will pass the V/V max test. Another, perhaps more intuitive, 
way to view the impact of voids and clusters in a. 40 is to compare the redshift distribution of cataloged 
galaxies to the prediction based on the survey's selection function (i.e., the percentage of galaxies at a given 
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distance that are detectable in ALFALFA). The selection function is determined by the 2DSWML analysis 
of the HIMF, and when combined with the measurement of the HIMF, predicts the redshift distribution for 
a homogeneously distributed set of galaxies selected from the HIMF. 

Figure [2~T1 compares this expectation to the observations in a. 40. The bumps and dips in the histogram 
represent under- and overdensities, respectively, in the survey volume. For example, the Virgo Cluster 
explains the enhancement near 1,000 km s _1 . The Pisces-Perseus supercluster and its foreground void also 
make clear imprints on this figure. 



7.3.1. Subregions of the a AO catalog 

If a. 40 does not represent a representative sampling of the universe, then statistical studies of the 
sample's characteristics, like the HIMF, may be subject to biases from large-scale structure. Because of its 
size, we can make an assessment by the impact of large scale structure within separate subregions of the 
catalog. The a. 40 sample is made up of 3 large, contiguous areas. In the Northern Galactic hemisphere, 
a.40 covers 07 h 30 m < a < 16' l 30 m in two separate blocks, 4° < 6 <16° and 24° < 6 < 28°. We refer to 
these subregions as aAO.Northl and aA0.North2, respectively. In the Southern Galactic hemisphere, a. 40 
covers 22' 1 < a < 03 h , 24° < 5 <32°, referred to as a AO. South. The entire a. 40, combined together, covers 
enough cosmological volume for the effects of large-scale structure on the derivation of the HIMF to begin 
to become minimal, but reducing its coverage further leads to a situation in which the HIMFs derived for 
individual sub-regions are strongly affected by over- and under-densities within their volume. 

Figure [52] displays the the HIMFs for the three subregions: aAO.Northl, North2, and South, from top 
to bottom. The fit parameters and values of Ojjj are given in Table [5] along with the fiducial 2DSWML 
HIMF for the entire a. 40 sample for comparison. The largest by a significant fraction is aAO.Northl, and it 
contributes over 50% of the 10,000 galaxies in a. 40. As expected, the HIMF for this region, when isolated, 
follows the HIMF for the sample as a whole. Because of the large volume in this region, the HIMF displayed 
in the top panel of Figure [22] is smooth and featureless. 

In the case of the smaller samples in the middle and bottom panels of Figure [H] features due to large- 
scale structure are clearly visible. Because of the inhomogeneity of the surveyed volume, the HIMFs do not 
follow the prescribed Schechter function. In the case of the aA0.North2 subsample, the faint-end slope is 
better fit on its own, in which case it is measured to be a = -1.4 ± 0.1. 

In eve ry case, the 'bumps' and wiggles in the sub-region HIMFs correspond to the cone diagram distri- 
butions in iMartin et all (|2O10h . In essence the combination of the ALFALFA survey's sensitivity and the 



scaling of survey volume with redshift leads to preferred distances for each of the HI mass bins in the HIMF 
(or preferred HI masses for every distance in the survey). A dip, for example, in the HIMF corresponds to 
an overdensity at the preferred distance for those HI mass scales. While the 2DSWML method has been 
designed to be less sensitive to large-scale structure, the volumes of these subregions are too small for these 
effects to average out. 

Such techniques can only work with the data they are given, but the 1/Y m ax approach allows for 
explicit correction for known structures. When these corrections are included in the 1/V max analysis of these 
subregions, the (unshown) results are very similar to those provided here. T hese corrections are based on the 
IRAS Point Source Catalog redshift survey (PSCz; iBranchini et~al1 rtl999h l density correction (see ^7.3.2|) . 



but imperfections in this correction lead to the same bump and dip features. An additional weakness of the 
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1 1^1 max density correction is that the counts can only be increased for galaxies that do end up in the sample, 
making the correction significantly less useful in voids. By contrast, 2DSWML essentially 'self-corrects' for 
over- and underdense regions. Rather than looking at volumes and scaling counts by 1/V max , 2D S WML 
constructs the relationship between bins by scaling the counts themselves and therefore automatically scales 
the HIMF downward for regions that are overdense and upward for regions that are underdense. 

This consideration of subregions within a. 40 makes clear the impact that large-scale structure can have 
on blind HI surveys and the importance of cosmologically significant volumes before global conclusions can 
be drawn. 



7.3.2. Large-scale Structure Correction from Previous Surveys 



As described in iMartin et al.l (|2010l ). the 1/V m ax method can be corrected to account for large-scale 
structure in the survey volume. Essentially, overdense regions arc considered to represent more effective 
volume (El/V e ff, rather than T,l/V max ) and vice versa for underdense regions, so that galax ies in various 
environments are weighted appropriately ( Springob et al. 2005bl; Rosenberg &: Schneider 2002). 



While this cor rection is successful, it does rely on datas ets external to the A LFALFA survey. In 
Martin et al. ( 2010h . the density map derived from the PSCz ( Branchini et al. 1999 ) was used to correct 
for large-scale structure. However, other options exist, in particular other PSCz maps (smo othed to differ- 



ent lev els) and the density reconstruction derived from the 2MASS Redshift Survey (2MRS; lErdogdu et al 
(|2006l )). The large-scale structure correction used has a large influence on the final HIMF estimate; a ^ 



20% effect on the Schechter parameters, compared to neglecting the density correction, was reported in 
Martin et alj (|2010i ). Given the magnitude of the effect, it is important to consider the impact that a differ- 
ent choice would make. In particular, since this portion of HIMF analysis is likely to always rely on external 
information, examining it here may be helpful in the future for the 100% ALFALFA sample. 

The parameter of interest reported by PSCz is the overdensity 5, defined relative to the average number 
density of galaxies found in those surveys: 



(8) 



In the case of the ALFALFA survey and the HIMF, we are primarily interested in the average value of 
(5 interior to each source's maximum detectable distance or, in other words, the average value of 5 in the 
volume over which the source could have been detected. Both the 2MRS and PSCz density maps report the 
value of 5 in equal- volume cells throughout their survey volume. The PSCz map was chosen because of its 
greater sensitivity in the nearby survey regions of a. 40, where the HIMF was especially vulnerable to the 
impact of large-scale structure. 

While PSCz wa s a good choice for the analysis of the a. 40 HIMF, there are actually several choices of 
maps available from [Branchini et al. ( 1999t ). with the primary differences being the smoothin g size of each 



volum e cell and the maximum distance out to which the density fields were reconstructed. In lMartin et al 



(|2010l ). the chosen map extended to 240 hr 1 Mpc and was smoothed with a Gaussian kernel of width 3.2 hr 1 
Mpc. The alternative options include a map that extends to only 120 hr 1 Mpc with a 3.2 Mpc kernel, 
and one that extends to 240 hr 1 Mpc with a larger Gaussian kernel of 7.7 hr 1 Mpc. The smoothing scale of 
PSCz maps can lead to underestimation of density contrasts. Because the primary effect of the large-scale 
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structure correction is on the lowest-mass bins of the HIMF, it is important to explore and understand the 
influence of this outside dataset. 

Upon examination of the av erage interior overde nsities determined for a. 40 in each map, we find that 



the PSCz.240.G3.2 map used in iMartin et al.l (]2010f ) for the 1/V m ax analysis of the HIMF represents an 



extreme estimate of the impact of large-scale structure within the survey volume. It is the most conservative 
option, given that it attaches lower weight to those galaxies found in nearby overdensities, particularly the 
Virgo Cluster, to prevent them from artificially boosting the faint-end slope. 

In order to quantify the effect of these options on the resulting HIMF, we use the PSCz.I20.G3.2 and 
PSCz.240.G7.7 to re-analyze the a. 40 HIMF. Where the maps do not reach the full redshift extent of the a. 40 
sample, we set the average interior 6 to for galaxies beyond the distance limit. In order to fit Schechter 
function par ameters in each cas e, we use the same uncertainty estimates for each HI mass bin point as 



presented in lMartin et al.l (|2010l ). as the PSCz map applied would only change the magnitude of each point 



and not its fluctuation due to HI mass uncertainties. 

Figure [23] shows the results, focusing on the low-mass end of the HIMF, since HI mass bins with 
Mhi > 1O 8 O M0 are not affected by the large-scale structure volume correction. The different large-scale 
structure corrections function effectively as a scaling in each bin, so that each option follows the fiducial 
case closely. Both PSCz. 120. G3. 2 and PSCz. 240. G7. 7 boost the faint-end slope, indicating that they are 
overcounting galaxies in the nearby overdensities, namely the Virgo Cluster. This analysis verifies that 
PSCz. 240. G3. 2 was the most conservative choice for correcting the l/V TOOX HIMF for the effects of large- 
scale structure. The changes to the low-mass slope a and the turnover mass M* are displayed in Table [3 
along with the measured 2DSWML parameters for reference. It is clear that the PSCz map with the greatest 
extent and the smallest smoothing radius is most appropriate for estimating the a. 40 HI mass function. 



8. Summary 



This paper presents the catalogued parameters for 15855 HI line detections extracted from ~ 2800 
deg 2 of high galactic latitude sky observed by the ALFALFA survey. A (pleasant) surprise for us has been 
the higher than expected ALFALFA detection rate, 5.6 sources per deg 2 , or, including only the objects that 
are certainly extragalactic, 5.3 sources per deg 2 . This latter detection value is a factor of 29 times greater 
than the rate of 0.18 sources per deg 2 achieved by HIPASS. The characteristic resolution of the ALFALFA 
spectral grids is about 4'; the positions of the HI sources can be determined to an accuracy typically better 
than 20". Using the publicly available SDSS and DSS2 imaging datasets, we have assigned probable optical 
counterparts to more than 98% of the 15041 extragalactic detections and provide a cross-reference to the 
SDSS DR7 photometric and imaging databases. An additional 814 HI line detections cannot be identified 
with stellar counterparts but lie within velocity ranges characteristic of the galactic/circumgalactic HVCs. 
Roughly 3/4 of the optically "dark" extragalactic HI sources are located in fields containing galaxies of 
known optical redshift; many are likely to be associated with tidal debris fields. We identify four objects as 
candidate OH megamasers redshifted to z ~ 0.17; one of those i s a rediscovery of a previo usly recognized 
OHM and is associated with a galaxy of the same optical redshift (jDarling fc Giovanellill200ll ). Future works 
will explore more systematically the OH M candidates throu ghout the ALFALFA bandpass and also will 
search for evidence of HI in absorption (jDarling et al.l 1201 lh . Unsurprisingly, a census of the HI bearing 
population of galaxies in the local universe is strongly biased against galaxies on the red sequence, but some 
luminous, red galaxies are detected in the HI line. In particular, ALFALFA provides a rich sampling of the 
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low-to-moderate density universe at z ~ 0. 



As a major ALFALFA data release, the a. 40 catalog presented here supercedes the datasets published by 
our team previously. In particular, the HI line flux densities reported here are based on further improvements 
in the software used for parameter extraction and increased knowledge of the system performance. The 
ALFALFA reduction pipeline may miss flux for sources which are very large compared to the beam size and 
offset from the center of the standard grids, but comparison with the HI line flux densities derived from 
pointed single dish observations and corrected for beam dilution and pointing errors with the ones reported 
here shows no systematic offsets except for the very largest and very strongest sources. The latter will need 
to be evaluated on a case-by-case basis in grids produced an d analyzed separately from the sta ndard process 
and where applicable, corrected for sidelobe contamination ( Dowelll 2010 : Powell et al. 2011 ). 



The goals and expectations of the ALFALFA surve y were ou t lined i n iGiovanelli et al.l (|200 5a) and 
survey source sensitivity and reliability was discussed in ISaintongd (|2007a| ) . As discussed previously, the 
integr ated HI line flux density threshold of a blind HI survey like ALFALFA increases with HI line profile 
width ([Martin et al.lfeOlOtlToribio et al.ll2011bl ). With the availability of the large a. 40 dataset, we test those 
expectations and give quantitative descriptions of the completeness and sensitivity of the ALFALFA survey 
as functions of logW^o. In addition to the highest quality, highly reliable (Code 1) HI detections, the a. 40 
catalog presented in Table [T] includes also sources of lower S/N which coincide in position and redshift with 
known optical galaxies (the "priors"). Because the availability of such prior information is highly dependent 
on the selection functions of other surveys, these additional objects should not be used in studies which 
require stringent consideration of statistical completeness. However, the vast majority are likely to be valid 
HI detections and hence they can be included in studies where the number of sources is most critical (e.g., 
peculiar velocity studies). Future work will be undertaken to confirm these detections and an additional set 
of low S/N possible detections which coincide with galaxies of unknown redshift. 

The sensitivity of ALFALFA and the thorough understanding of its performance enable a robust mea- 
surement of the HIMF, and in particular, of its faint-end slope a and the energy density of neutral hydrogen 
Qhi at z = 0. On the low-mass end of the HIMF, ALFALFA improves on previous blind HI surveys in terms 
of sample size, angular and spectral resolution, sampling of cosmic volume, and assumptions of pure Hubble 
flow. At the lowest HI masses, ALFALFA'S finer velocity resolution is an important factor in obtaining a 
full count of the gas-rich dwarf population. 

On the high-mass end, previous HI surveys have overlooked the locally-rare population of very massive 
HI disks. We have evaluated the possible impact on the derived HIMF of missi ng sources at bot h the 
broad and narrow width ends, particularly in comparison with the HIPASS catalog (jMever et alJ l2004). We 
conclude that HIPASS did not recognize the richness of the very high HI mass population, not because it 
failed to identify the systems with the broadest widths but because it did not have adequate sensitivity at 
large distances and was limited to only 64 MHz of bandpass. It is ALFALFA'S combination of sensitivity, 
spectral and angular resolution, frequency and sky coverage which yields a robust census of the HI bearing 
population at z = 0. 

With ALFALFA still only 40% complete, we have shown that the 2DSWML and l/V ma x methods yield 
results on the HIMF in good agreement, but that the loss of significant volume in the ALFALFA survey 
beyond 15000 km s _1 reduces the performance of the 2DSWML approach if that region is included. A 
realistic treatment of distance and flux density uncertainties, translated into mass uncertainties, avoids the 
strong bias in a and the shape of the HIMF introduced by an assumption of Hubble flow in the local volume. 
While a. 40 does not yet provide a completely representative sampling of the local cosmological volume, our 
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method for including the impact of large-scale structure is a conservative choice, and future data releases 
from ALFALFA will further improve both statistical and systematic uncertainties. We look forward to 
completing the ALFALFA survey. 
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Table 1. Properties of HI Detections 



AGC 


Name 


HI Coords 


Opt. Coords 


CZ 


W 50 (e w ) 




521 


S/N 


rms 


Dist 


logM 


HI 


Codes 


# 




J2000 


J2000 


km s~ 1 


km s - 1 


Jykm s 1 




m Jy 


Mpc 


M 




(1) 


(2) 


(3) 


(4) 


(5) 


(6) 




(7) 


(8) 


(9) 


(10) 


(11) 


(12) 


331061 


456-013 


000002.5+155220 


000002.1 + 155254 


6007 


260( 45) 


1 


13(0.09) 


6.5 


2.40 


85.2 


9 


29 


1 I 


331405 




000003.3+260059 


000003.5+260050 


10409 


315( 8) 


2 


62(0.09) 


16.1 


2.05 


143.8 


10 


11 


1 I 


102896 




000006.8+281207 


000006.0+281207 


16254 


406( 17) 


2 


37(0.12) 


11.2 


2.31 


227.4 


10 


46 


1 I * 


102574 




000009.1+280543 




-368 


23( 3) 


1 


29(0.08 ) 


11.2 


5.05 








9 U * 


102975 




000012.3+290137 




-367 


23( 3) 


2 


85(0.07) 


26.7 


4.69 








9 U * 


102571 




000017.2+272359 


000017.3+272403 


4654 


104( 3) 


2 


00(0.06) 


19.0 


2.29 


65.9 


9 


31 


1 I 


102976 




000019.0+285931 




-365 


26( 2) 


2 


53(0.11) 


18.3 


5.76 








9 U * 


102728 




000021.2+310038 


000021.4+310119 


566 


21( 6) 





31(0.03) 


7.5 


1.92 


9.1 


6 


78 


1 I 


102575 




000028.0+280845 




-371 


33( 7) 





47(0.03) 


8.6 


2.11 








9 U * 


12896 


478-010 


000030.1+261928 


000031.4+261931 


7653 


170( 10) 


3 


14(0.08) 


22.0 


2.44 


104.5 


9 


91 


1 I * 


102729 




000032.1+305152 


000032.0+305209 


4618 


53( 6) 





70(0.04) 


10.5 


2.02 


65.4 


8 


85 


1 I 


102576 




000035.3+262712 




-430 


21( 2) 





60(0.04) 


11.7 


2.50 








9 U * 


102730 




000040.1+315610 


000039.5+315618 


12631 


79( 23) 





66(0.05) 


7.3 


2.25 


175.8 


9 


68 


1 I 


102578 




000042.3+263311 




-429 


22( 3) 





67(0.04) 


12.8 


2.44 








9 U * 


101866 




000050.1 + 141612 


000047.9+141639 


10877 


291(149) 





79(0.11) 


4.1 


2.52 


150.3 


9 


62 


2 I * 


12901 


499-035 


000059.5+285431 


000058.9+285441 


6896 


395 ( 5) 


5 


03(0.11) 


25.2 


2.24 


93.7 


10 


02 


1 I * 


102731 


FGC290A 


000109.3+305221 


000106.4+305247 


7366 


257( 8) 


1 


33(0.08) 


8.9 


2.08 


100.5 


9 


50 


1 I 


102977 




000108.7+284738 




-364 


22( 3) 


2 


03(0.11) 


13.8 


6.64 








9 U * 


102861 




000110.1+320425 




-181 


22( 1) 


7 


30(0.06) 


55.0 


4.69 








9 U * 


102732 




000114.8+312218 


000115.0+312227 


12532 


292( 5) 


1 


54(0.09) 


9.1 


2.20 


174.3 


10 


04 


1 I 


101869 




000127.1 + 142431 


000131.4+142427 


12639 


183( 16) 


1 


00(0.09) 


6.4 


2.57 


175.4 


9 


86 


1 I * 


102733 




000129.8+311418 


000130.0+311403 


12581 


134( 12) 


1 


03(0.08) 


8.6 


2.29 


175.0 


9 


87 


1 I 


12911 


N7806 


000131.5+312629 


000130.1+312631 


4767 


231( 23) 


1 


40(0.08) 


9.4 


2.19 


67.5 


9 


18 


1 I * 


331082 


433-016 


000134.5+150448 


000134.0+150454 


6368 


118( 8) 


2 


72(0.08) 


21.4 


2.60 


85.9 


9 


67 


1 I 


748776 




000142.4+135019 


000141.3+135033 


6337 


53( 5) 





65(0.05) 


8.7 


2.27 


89.9 


9 


09 


1 I 



Note. — 



Tabled will be available as a datafile. A portion is shown here for guidance regarding its form and content. 



Table 2. Comments on Individual Sources 



AGC 


Cat. ID. 


HI Code 


C o iii hi c lit 


102896 




1 


In region affected by RFI; parameters uncertain; near smaller AGC 102897 (000005.5+281129, unknown cz) at 0.7 arcmin 


102574 




9 


HVC; first of two knots near the top of the grid; sec also AGC 102575 at 5.1 arcmin 


102975 




9 


HVC; part of filament that stretches through most of this grid 


102976 




9 


HVC; part of a filament that extends beyond this grid into 0004+29 


102575 




9 


HVC; second of two knots near the top of the grid; sec also AGC 102574 at 5.1 arcmin 


12896 




1 


Near AGC 331800 (MCG+04-01-009, 0000316+261818, cz=7754) at 1.2 arcmin 


102576 


2- 4 


9 


Compact HVC; one of two nearby knots (the other is AGC 102578 


102578 


2- 5 


9 


Compact HVC; one of two nearby knots (the other is AGC 102576) 


101866 




2 


Ambiguous OC; several near including AGC 103024 (000049.5+141532, unknown cz) at 1.2 arcmin; others may be background 


12901 




1 


Small companion at 0.4 arcmin AGC 103021 (000057.5+285427, unknown cz) 


102977 




9 


HVC; faint south end of a filament that stretches through most of this grid 


102861 




9 


HVC 110.7-29.6 part of nice arc 


12911 




1 


Multiple system NGC 7805/6; UGC 12908 = NGC 7805 group; blend? 


101869 




1 


AGC 101869 (000149.5+142623, cz=12568) at 5.7 arcmin 


102862 




9 


HVC 110.5-31.0 part of nice arc 


102978 




9 


HVC; part of filament that stretches through most of this grid 


102735 




1 


Optical identification with bluer galaxy in pair; AGC 102831 (000250.0+281725, unknown cz) at 0.3 arcmin 


102863 




9 


HVC 110.8-30.0 part of nice arc 


102979 




9 


HVC; part of filament that stretches through most of this grid 


749126 




9 


HVC 1-6.04-45.19 


102864 




9 


HVC 110.7-30.7 part of nice arc 


749127 




9 


HVC 105.34-47.32 


102981 




1 


OC identified with larger of pair; second is AGC 103015 (000250.0+281725, unknown cz) at 1.6 arcmin 


7 




1 


OC identified with larger of pair; second is AGC 100849 (000306.3+155834, unknown cz) at 1.2 arcmin 


100011 




2 


Poor spatial and spectral definition 



Note. — 



Tablc[2]will be available as a datafilc. A portion is shown here for guidance regarding its form and content. 
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Table 3. The ALFALFA-SDSS DR7 Cross-reference 



AGC 


HI Code 


SDSS 


PhotoObjID) 


SpectObjID 




del 


(u-r) 


z 




(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


(8) 


(9) 


3310ol 


1 


1 


58773077549 9407375 


21133058207488409b 


1 -1 


^7 
/ ( 


1.59 


0.02002 


0.00010 


331405 


1 


1 


587740589481525478 




15 


1 1 


1.97 






102896 


1 


1 


7588743709 90704887 




15 


2(> 


2.26 






102571 


1 


1 


758874297994314032 




10 


10 


1.39 






102728 


1 


1 


758874299066483769 




18 


93 


2.04 






12896 


1 


1 


758874370400680283 




13 


98 


1.29 






10272y 


1 


1 


758874299006548754 




18 


32 


1.43 






102730 


1 


1 


758874299602900817 




16 


94 


1.55 






101806 


2 


1 


587730773351989400 


211330580741095424 


15 


15 


2.48 


U.U3613 


0.0U010 


12901 


1 




75887437 loooouo loo 




13 


69 


2.64 






102731 


1 


) 


758874372069392715 




16 


01 


1.70 






102732 


1 




758874299603223055 




14 


91 


1.93 






101869 


1 




587727221413707929 


211330580573323264 


15 


82 


1.85 


0.04189 


0.00010 


102733 


1 




758874299603288292 




15 


85 


1.82 






12911 


1 




758874299603222635 




13 


25 


3.00 






331082 


1 




587730774425796793 


211330582490120192 


14 


87 


1.46 


0.02123 


0.00007 


748776 


1 




587730772815184088 




16 


96 


1.14 






102734 


1 




758874372605739306 




15 


90 


1.35 






101873 


1 




587727223561257129 


211330582536257536 


16 


35 


2.38 


0.04254 


0.00009 


102735 


1 




758874299603419848 




18 


38 


0.67 






101877 


1 




587727221413773686 


211330580648820736 


16 


70 


1.40 


0.01734 


0.00033 


102980 


1 




758874371533635834 




15 


87 


1.76 






12920 


1 




758874298531316055 




15 


13 


2.04 






100006 


1 




758874372606001325 




14 


28 


2.70 






100008 


1 




758874372069982254 




16 


35 


1.48 







Note. — Tablc[3]will be available as a datafilc. A portion is shown here for guidance regarding its form and content. 



Table 4. OH Megamaser candidates 



AGC 


OHM Coords (2000) 


Opt. Coords (J2000) 


z op t 


zo/f 


C221 


Foh 


S/N 


rms 


# 


hh mm ss.s+dd mm ss 


hh mm ss.s+dd mm ss 






km s - 1 


Jy km s — 1 




mjy 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


(8) 


(9) 


102708 


000337.0+253215 


000336.1+253204 




0.169 


-1335 


0.91 


5.7 


2.33 


102850 


002958.8+305739 


002958.2+305832 




0.172 


-596 


0.46 


6.7 


2.09 


181310 


082311.7+275157 


082312.7+275138 


0.16783 


0.168 


-1551 


2.17 


15.9 


2.18 


228040 


124540.5+070337 


124545.7+070347 




0.172 


-624 


0.33 


5.1 


2.11 



Table 5. HI Mass Function Fit Parameters by Redshift Extent 



Sample and 
Fitting Function 


a 


<t>, 

(10~ 3 h? Mpc -3 dcx -1 ) 


log (M,/M ) 
+ 2 log h 70 


Hffj, fit 
(x 10- 4 h" 1 ) 


£Ihi, points 
( x lO" 4 h™ 1 ) 


1/V max , 15,000 km s~ la 


-1.33 (0.04) 


3.1 (0.6) 


9.95 (0.05) 




4.4 (0.1) 


1/V max , 18,000 km s~ la 


-1.34 (0.03) 


3.8 (0.6) 


9.92 (0.04) 




4.3 (0.1) 


2DSWML, 15,000 km s" 1 


-1.34 (0.02) 


4.7 (0.3) 


9.96 (0.01) 


4.3 (0.3) 


4.4 (0.1) 


2DSWML, 18,000 km s -1 


-1.26 (0.02) 


3.4 (0.2) 


10.00 (0.01) 


3.0 (0.2) 


3.1 (0.1) 



a In the 1/V-max case, pure Schcchter functions provide a poor fit to the faint-end slope a, and the sum of a Schechter and 
a Gaussian function are used to complete the fit. The Gaussian component parameters are not shown in the table, given that 
they are not expected to be physical. 
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Table 6. 2DSWML HIMF Schechter Parameters by Region 



Sample and 
Fitting Function 


a 


<t>, 

(10~ 3 h? Mpc -3 dcx" 1 ) 


log (M»/M s ) 
+ 2 log h 70 


Off / , fit 
(x lO" 4 h" 1 ) 


O/f i . points 
( x 10- 4 h" 1 ) 


Northl 


-1.35 (0.02) 


4.4 (0.3) 


9.98 (0.02) 


4.3 (0.4) 


4.4 (0.1) 


North2 


-1.25 (0.04) 


5.6 (0.6) 


9.92 (0.02) 


4.2 (0.5) 


4.3 (0.2) 


South 


-1.30 (0.04) 


4.1 (0.5) 


9.96 (0.3) 


3.6 (0.5) 


3.5 (0.2) 


Whole a. 40 


-1.34 (0.02) 


4.7 (0.3) 


9.96 (0.01) 


4.3 (0.3) 


4.4 (0.1) 



Table 7. 1/V max HIMF Schechter Parameters by PSCz Map 

PSCz Map a log (M„/M ) 

2DSWML Result -1.33 (0.02) 9.96 (0.02) 

PSCz.240.G3.2 -1.33 (0.03) 9.95 (0.04) 

PSCz.l20.G3.2 -1.39 (0.03) 9.96 (0.05) 

PSCz.240.G7.7 -1.44 (0.04) 9.98 (0.06) 
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Fig. 1. — Sky distribution, in equatorial coordinates on an Aitoff grid projection, of the current a. 40 catalog 
detections. Upper panel: the "fall ALFALFA sky" (anti- Virgo direction) region; lower panel: the "spring 
ALFALFA sky" (Virgo direction) region. Blue, red and green symbols identify the Code 1 (best quality), 
2 (priors) and 9 (HVC) sources respectively. The green diagonal lines in each panel trace the supergalactic 
plane and SGL ± 10°. 
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. 2. — Histograms of the distributions of redshift cz, W50, log S21, log S/N and log Mhi (top to bottom) 
the a. 40 catalog sample presented in Table [1] 



50 100 150 200 250 

Distance (Mpc) 



Fig. 3. — Spaenhauer diagram for the a. 40 catalog sample presented in Tabled] The superposed blue (upper) 
curve traces the HIPASS completeness limit, while the red (lower) curve traces that survey's detection limit. 
The vertical dashed line indicates the outer limit in distance corresponding to the HIPASS bandpass edge; 
HIPASS did not sample any volume at larger distances. The vertical overdensity points evident at 17 Mpc is 
the Virgo cluster; the paucity of points at ^225 Mpc arises because many nights of ALFALFA observations 
are contaminated by strong RFI generated by the FA A radar at the San Juan airport. A less pronounced 
gap evident at ^85 Mpc arises from occasional much milder contamination from a harmonic of the radar 
at 1380 MHz and from rare burst events associated with the US Air Force NUclear DETonation detection 
(NUDET) system aboard the Global Positioning System (GPS) which transmits at 1381 MHz. 
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Fig. 4. — Illustrative examples of issues related to the identification of the OCs of ALFALFA HI sources. 
Each panel is a 3' by 3' frame extracted from the Montage data product of SDSS g-band images centered 
on the position of the ALFALFA HI source. In each frame, the superposed circle, of arbitrary size, identifies 
the adopted OC. See text for details of individual cases. 
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Fig. 5. — Cone diagrams showing the distribution of a. 40 HI sources (blue open circles) and those with optical 
redshifts from the SDSS (filled red circles) within the spring sky strip covering 24° < Dec. < +28°. The 
upper diagram shows the volume extending over the full ALFALFA bandwidth to 18000 km s _1 (including 
regions impacted by terrestrial interference). The bottom diagram contains only the volume to 9000 km s _1 . 
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Fig. 6. — Cone diagrams showing the distribution of a. 40 HI sources (blue open circles) and those with 
reported optical redshifts (filled red circles) within the fall sky strip covering 24° < Dec. < +28°. The upper 
diagram shows the volume extending over the full ALFALFA bandwidth to 18000 km s _1 (including regions 
impacted by terrestrial interference). The bottom diagram contains only the volume to 9000 km s^ 1 . The 
lack of coverage by the SDSS is evident in the paucity of optical redshifts in comparison with Figure [5] 
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Fig. 7. — Gray scale color magnitude diagram, based on SDSS DR7 photometry, for the ALFALFA-SDSS 
overlap sampl e using the model ma gnitudes and colors as given in Table [3] The x and y ranges are matched 
to Figure 2 of iBaldrv et al.l (|2004j) for comparative purposes. The superposed dashed line is the optimum 
divider given as Equation 11 of that paper which separates the red sequence from the blue cloud. 
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W2708 



102850 



181310 



228040 



* 



Fig. 8. — Optical images of the four best OHM candidates listed in Table |U The image of AGC 102850 
comes from the DSS2(B) while the others are SDSS-g; each image is 3' on a side. 
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Fig. 9. — Comparison of systemic velocity czq (upper panel) and velocity width measure ments W50 (low 



panel) obtained by the ALFALFA survey and values given in the Cornell Digital HI archive (jSpringob et al 
2005ah . 



-49 - 



<Ratio> = 1.03 

o= 0.238 

N = 1888 




12 3 

log S 21 (ALFALFA) (Jy km s" 1 ) 



< 



5 2- 



< 

U- 

< 
< 



3 



<Ratio> = 1.10 

a= 0.365 

N = 347 




^ i i i i i i i i i i i i i i i i i 

-10 12 
log S 21 (ALFALFA) (Jy km s- 1 ) 



Fig. 10. — Top: Comparison of HI line flux density measurements S21 for the 1888 galaxies in common 
between a. 40 and ISpringob et al.1 (|2005al ). The vertical axis displays the ratio of the HI line flux density 
detected by ALFALFA to the corres pnding value corrected for source extent and pointing errors (but not 
internal HI absorption) reported by ISpringob et al.l (|2005al ) . ALFALFA Code 1 detections are plotted as 
blue open symbols, while Code 2 (priors) detections are shown as red filled circles. The flaring of the ratio 
at low fluxes is expected. Bottom: Similar comparison with 347 galaxies detected by HIPASS. No Code 2 
detections were detected by HIPASS. 
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Fig. 11. — Three representative examples of the S21 - S 21 dN/d\og S21 distribution, used to evaluate 
completeness. Datapoints with errorbars (la Poisson) represent the distribution of Code 1 sources in a low 
(upper panel), intermediate (middle panel) and high (bottom panel) profile width bin. The downturn of the 
distributions at low S21 marks the limit where the survey completeness falls below unity. The red dashed 
line corresponds to an error function fit to the data, while the vertical red solid line represents the flux where 
the survey completeness is 90% according to the fit, S2i,90%,Codei- Values of 621,90%, Codei f° r eacn width 
bin (W50) are used to derive the 90% completeness line of the survey presented in Equation @] A similar 
analysis has been used for the combined catalog of Code 1 and 2 sources. 
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Fig. 12. — Distribution of a. 40 extragalactic sources in the profile width versus integrated flux density 
(logWso — log S21 ) plane. The upper panel shows the distribution of Code 1 detections only, while the 
lower panel shows the same for the whole a AO catalog, including Code 1 (blue symbols) and Code 2 (green 
symbols) detections. In both panels, the solid red line corresponds to the 90% completeness limit, while 
the red dash-dotted line corresponds to the 50% ("sensitivity limit") and the red dotted line to the 25% 
("detection limit") completeness limits. See $5] for the analytical expressions for the plotted limits, as well 
as for an explanation of the derivation method. 
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Fig. 13. — The distribution of profile widths in a. 40 (open histogram) and HIPASS (filled histogram) for 
objects with log M HI /M Q > 10.0. 
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Fig. 14. — The distribution of profile widths W 50 in ALFALFA (open circles) and HIPASS (filled circles, 
enlarged for visual clarity), for objects with log Mhi/Mq < 8.0. The overplotted horizontal dashed line 
shows the profile width cutoff at 30 km s -1 , the limit for inclusion in the HIPASS catalog. 
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Fig. 15. — The HIMF found via the 2DSWML method (without jackknife resampling) when Code 2 sources 
are included. The best-fit Schechter function is overplotted as a dashed line, with the best-fit parameters 
displayed. While VLhi and the overall Schechter function shape are not changed, the inclusion of the ad- 
ditional sources does slightly flatten the faint-end slope compared to results obtained using only Code 1 
sources (Table [5]) . 
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Fig. 16. — Residuals (best-fit Schechter model subtracted from binned data) of HI mass functions calculated 
using only Code 1 sources (top) and both Code 1 and 2 s ources (bottom). In both cases, the comparison model 
is the fiducial, Code 1-only Schechter function given by Martin et al. ( 2010l ). The zero- residual reference line 
is overplotted as a dashed line. 
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Fig. 17. — Primary distances from the literature vs. estimates based only on pure Hubble flow, with the 
ALFALFA distance uncertainty estimates overplotted. The dashed line indicates a one-to-one correlation. 
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Fig. 18. — The HI mass function obtained via the 2DSWML method when distances, and t herefore masses , 
are obtained assuming pure Hubble flow with Ho = 70 km s _1 Mpc _1 . As anticipated by Masters! ( 2005 ). 
the adoption of pure Hubble flow yields an underestimate of the low HI mass slope a. 



- 58 - 



12 



1 1 



10 



9 



8 



7 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 * 



/ 



/ 



/ 



/ 



-m-^ — h 

/ 



/ 



1 7^" 

/ 



/ 



i i i i i i i 



5 6 7 8 9 10 111 
Mass Bin (log M HI ) 



Fig. 19. — The average (mean) mass falling into each HIMF bin. The estimated la uncertainty of a galaxy's 
HI mass is overplotted as error bars, along with a dotted line indicating a one-to-one relationship. 
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Fig. 20. — The typical (mean) value of V/V TOO x ; binned by HI mass. Error bars are Poisson counting 
uncertainties. The solid line indicates < V/V max > — 0.5, while the dashed line indicates < V/V ma x > = 
0.45 for the a. 40 sample. 
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Fig. 21. — The observed redshift distribution of a. 40 galaxies (histogram) compared to the expected distri- 
bution obtained via the survey's selection function. 
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Fig. 22. — The HIMF estimated for separate subregions of the a. 40 catalog via the 2DSWML method with 
Schechter fit parameters. Top panel: Results for the aAO.Northl region. Middle panel: same, for the 
aA0.North2 region. Bottom panel: same, for the a.40. South sample. See Table [6] for futher quantitative 
details. 
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Fig. 23. — The low-mass end of the HIMF, s howing dependence o n the chosen PSCz density reconstruction 
map. The fiducial 1/V max HIMF reported in Martin et al. ( 201ol ) is shown as a filled circle, with two other 
maps represented by squares and triangles. 



